學術產出-Theses
Article View/Open
Publication Export
-
題名 矩陣分解法對網路評比資料分析之探討
Matrix Factorization Techniques for Analysis of Online Rating Data作者 張良卉 貢獻者 翁久幸
Ruby Chui-Hsing Weng
張良卉關鍵詞 推薦系統
合作式推薦系統
潛在因素模型
矩陣分解法日期 2012 上傳時間 1-Jul-2013 17:01:54 (UTC+8) 摘要 隨著科技的進步、網路的發達,我們生活在資訊爆炸的社會。許多企業網站或網路商店在將產品銷售給消費者的過程中,紛紛使用了推薦系統,商家使用網路作為行銷的手法,消費者也會透過網路尋找自己想要的產品,推薦系統就在這個環境中產生。 推薦系統根據使用者的特性或喜好,將使用者可能會喜歡的資訊或實物推薦給使用者。推薦系統的運作方式分成兩大類,第一類是內容導向式推薦系統(content filtering approach),對所有項目賦予一連串的屬性,再依照使用者的個人資料和項目屬性做比對,藉此推薦較符合該位使用者喜好的項目。第二類是合作式推薦系統(collaborative filtering approach),此方法利用的是使用者彼此之間或是項目與項目之間的關係,其概念是:與使用者有類似喜好的人所喜歡的東西或是與使用者喜好的東西相似的物品,都可能是使用者也會喜歡的項目。其方便之處在於只需要使用者對項目的評分資料即可運作。 合作式推薦系統又包含兩個領域,分別為鄰域法(neighborhood methods)和潛在因素模型(latent factor models)。鄰域法著重於使用者之間或項目之間的關係,根據所利用不同的關係又可分為項目導向與使用者導向兩種方法。潛在因素模型則試著從項目和使用者的評分關係中找出有何潛在因素,矩陣分解法(matrix factorization)即為其方法之一。 本研究旨在探討矩陣分解法,矩陣分解法為合作式推薦系統中的潛在因素模型方法之一。矩陣分解法利用使用者對於項目特性的喜好和項目所具備的特性作為預測評分的準則,但會影響使用者評分的因素不只這些,可能還包括了使用者自己的衡量準則、或是項目本身的優良程度,因此可以將這些導致偏差的因素加入矩陣分解法的模型中來調整預測的結果。因此本研究想探討在加入偏差項以後的矩陣分解法是否能使預測的結果更準確。 本研究使用來自Minnesota 大學GroupLens Research Project 中的MovieLens 資料來分析。在經過實證分析以後,我們發現加入偏差項的矩陣分解法比起矩陣分解法在預測上確實能夠提升預測的準確度,但會花較多的時間在運算上。
The explosive growth of the internet has led to information overload. Electronic retailers and content providers use recommender systems to meet a variety of special needs and tastes. The retailers use the internet as a marketing method, and the consumers use the internet to find the products they want. Recommender systems then appear. Such systems are particularly useful for entertainment products such as movies, music, and TV shows. Recommender systems recommend the products or the information users may like to them by their characteristic and preference. Recommender systems can be divided to two strategies. One is content filtering approach, which creates a profile for each user or product to characterize its nature. Another is collaborative filtering approach, which relies only on past user behavior without requiring the creation of explicit profiles. Collaborative filtering analyzes relationships between users and interdependencies among products to identify new user-item associations. The two primary areas of collaborative filtering are the neighborhood methods and latent factor models. Neighborhood methods are centered on computing the relationships between items or, alternatively, between users. Latent factor models are an alternative approach that tries to explain the ratings by characterizing both items and users on factors inferred from the ratings patterns. Matrix factorization techniques are some of the most successful realizations of latent factor models. One benefit of the matrix factorization approach to collaborative filtering is its flexibility in dealing with various data aspects and other application-specific requirements. It tries to capture the interactions between users and items that produce the different rating values. However, much of the observed variation in rating values is due to effects associated with either users or items, known as biases or intercepts, independent of any interactions. This research try to find out whether putting the biases into matrix factorization models makes the prediction more accurate. This research analyzed the MovieLens data from GroupLens Research Project of Minnesota University. We found that adding biasterms to matrix factorization can improve the accuracy of prediction, though it requires a bit more computing time.參考文獻 1. Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, and John Riedl (1994), “GroupLens: An Open Architecture for Collaborative Filtering of Netnews,” Proceedings of ACM 1994 Conference on Computer SupportedCooperative Work, Chapel Hill, pp. 175-186.2. Joseph A. Konstan, Bradley N. Miller, David Maltz, Jonathan L. Herlocker, Lee R. Gordon, and John Riedl (1997), “GroupLens: Applying Collaborative Filtering to Usenet Nwes,” Comminications of the ACM, Mar1997, Vol. 40, Issue 3, pp. 77-87.3. Kwok-Wai Cheung, Kwok-Ching Tsui, and Jiming Liu (2004), “Extended Latent Class Models for Collaborative Recommendation,” IEEE Transactions on Systems, Man Cybernetics: Part A, Jan 2004, Vol. 34, Issue 1, pp. 143-148.4. Wenye Li, Kin-hong Lee, and Kwong-sak Leung (2006), “Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space,” Neural Information Processing Systems – NIPS, pp. 881-888.5. J. Bennet, and S. Lanning, “The Netflix Prize,” KDD Cup and Workshop, 2007;www.netflixprize.com.6. Daniel E. Ho, and Kevin M. Quinn (2008), “Improving the Presentation and Interpretation of Online Ratings Data with Model-Based Figures,” The Amreican Statistician, Nov 2008, Vol. 62, Issue 4, pp. 279-288.7. Martijn Kagie, Matthijs van der Loos, and Michiel van Wezel (2009), “Including item characteristics in the probabilistic latent semantic analysis model for collaborative filtering,” AI Communications, 22, 2009, pp. 249-265.8. Yehuda Koren, Robert Bell, and Chris Volinsky (2009), “Matrix Factorization Techniques for Recommender Systems,” IEEE Computer Society, Aug 2009, Vol.42, Issue 8, pp. 42-49.9. Yehuda Koren (2010), “Collaborative Filtering with Temporal Dynamics,” Comminications of the ACM, APR 2010, Vol. 53, Issue 4, pp. 89-98.10. 張孫浩 (2011), 網路評比資料之統計分析, 國立政治大學統計學系碩士論文11. Netflix. Retrieved JUN, 2013, from http://www.netflix.com12. Amazon. Retrieved JUN, 2013, from http://www.amazon.com13. TiVo. Retrieved JUN, 2013, from http://www.tgc-taiwan.com.tw/index.php14. GroupLens Research. Retrieved Nov, 2013, from http://www.grouplens.org 描述 碩士
國立政治大學
統計研究所
100354028
101資料來源 http://thesis.lib.nccu.edu.tw/record/#G1003540281 資料類型 thesis dc.contributor.advisor 翁久幸 zh_TW dc.contributor.advisor Ruby Chui-Hsing Weng en_US dc.contributor.author (Authors) 張良卉 zh_TW dc.creator (作者) 張良卉 zh_TW dc.date (日期) 2012 en_US dc.date.accessioned 1-Jul-2013 17:01:54 (UTC+8) - dc.date.available 1-Jul-2013 17:01:54 (UTC+8) - dc.date.issued (上傳時間) 1-Jul-2013 17:01:54 (UTC+8) - dc.identifier (Other Identifiers) G1003540281 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/58670 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 統計研究所 zh_TW dc.description (描述) 100354028 zh_TW dc.description (描述) 101 zh_TW dc.description.abstract (摘要) 隨著科技的進步、網路的發達,我們生活在資訊爆炸的社會。許多企業網站或網路商店在將產品銷售給消費者的過程中,紛紛使用了推薦系統,商家使用網路作為行銷的手法,消費者也會透過網路尋找自己想要的產品,推薦系統就在這個環境中產生。 推薦系統根據使用者的特性或喜好,將使用者可能會喜歡的資訊或實物推薦給使用者。推薦系統的運作方式分成兩大類,第一類是內容導向式推薦系統(content filtering approach),對所有項目賦予一連串的屬性,再依照使用者的個人資料和項目屬性做比對,藉此推薦較符合該位使用者喜好的項目。第二類是合作式推薦系統(collaborative filtering approach),此方法利用的是使用者彼此之間或是項目與項目之間的關係,其概念是:與使用者有類似喜好的人所喜歡的東西或是與使用者喜好的東西相似的物品,都可能是使用者也會喜歡的項目。其方便之處在於只需要使用者對項目的評分資料即可運作。 合作式推薦系統又包含兩個領域,分別為鄰域法(neighborhood methods)和潛在因素模型(latent factor models)。鄰域法著重於使用者之間或項目之間的關係,根據所利用不同的關係又可分為項目導向與使用者導向兩種方法。潛在因素模型則試著從項目和使用者的評分關係中找出有何潛在因素,矩陣分解法(matrix factorization)即為其方法之一。 本研究旨在探討矩陣分解法,矩陣分解法為合作式推薦系統中的潛在因素模型方法之一。矩陣分解法利用使用者對於項目特性的喜好和項目所具備的特性作為預測評分的準則,但會影響使用者評分的因素不只這些,可能還包括了使用者自己的衡量準則、或是項目本身的優良程度,因此可以將這些導致偏差的因素加入矩陣分解法的模型中來調整預測的結果。因此本研究想探討在加入偏差項以後的矩陣分解法是否能使預測的結果更準確。 本研究使用來自Minnesota 大學GroupLens Research Project 中的MovieLens 資料來分析。在經過實證分析以後,我們發現加入偏差項的矩陣分解法比起矩陣分解法在預測上確實能夠提升預測的準確度,但會花較多的時間在運算上。 zh_TW dc.description.abstract (摘要) The explosive growth of the internet has led to information overload. Electronic retailers and content providers use recommender systems to meet a variety of special needs and tastes. The retailers use the internet as a marketing method, and the consumers use the internet to find the products they want. Recommender systems then appear. Such systems are particularly useful for entertainment products such as movies, music, and TV shows. Recommender systems recommend the products or the information users may like to them by their characteristic and preference. Recommender systems can be divided to two strategies. One is content filtering approach, which creates a profile for each user or product to characterize its nature. Another is collaborative filtering approach, which relies only on past user behavior without requiring the creation of explicit profiles. Collaborative filtering analyzes relationships between users and interdependencies among products to identify new user-item associations. The two primary areas of collaborative filtering are the neighborhood methods and latent factor models. Neighborhood methods are centered on computing the relationships between items or, alternatively, between users. Latent factor models are an alternative approach that tries to explain the ratings by characterizing both items and users on factors inferred from the ratings patterns. Matrix factorization techniques are some of the most successful realizations of latent factor models. One benefit of the matrix factorization approach to collaborative filtering is its flexibility in dealing with various data aspects and other application-specific requirements. It tries to capture the interactions between users and items that produce the different rating values. However, much of the observed variation in rating values is due to effects associated with either users or items, known as biases or intercepts, independent of any interactions. This research try to find out whether putting the biases into matrix factorization models makes the prediction more accurate. This research analyzed the MovieLens data from GroupLens Research Project of Minnesota University. We found that adding biasterms to matrix factorization can improve the accuracy of prediction, though it requires a bit more computing time. en_US dc.description.tableofcontents 1 緒論 5 1.1 研究背景﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 5 1.1.1 推薦系統簡介﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 5 1.2 研究目的﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 62 文獻回顧 7 2.1 推薦系統的運作﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 7 2.2 合作式推薦系統﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 7 2.2.1 鄰域法﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 8 2.2.2 潛在因素模型﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 10 2.3 以IRT 模型分析評比資料的方法﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 103 研究方法 13 3.1 矩陣分解法﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 13 3.2 加入偏差項的矩陣分解法﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 174 實證研究 23 4.1 實證資料﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 23 4.2 矩陣分解法分析﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 24 4.2.1 矩陣分解法預測結果﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 24 4.2.2 加入偏差項的矩陣分解法預測結果﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 26 4.3 預測結果比較﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 285 結論與建議 29參考文獻及相關書目 30 zh_TW dc.format.extent 947281 bytes - dc.format.mimetype application/pdf - dc.language.iso en_US - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G1003540281 en_US dc.subject (關鍵詞) 推薦系統 zh_TW dc.subject (關鍵詞) 合作式推薦系統 zh_TW dc.subject (關鍵詞) 潛在因素模型 zh_TW dc.subject (關鍵詞) 矩陣分解法 zh_TW dc.title (題名) 矩陣分解法對網路評比資料分析之探討 zh_TW dc.title (題名) Matrix Factorization Techniques for Analysis of Online Rating Data en_US dc.type (資料類型) thesis en dc.relation.reference (參考文獻) 1. Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, and John Riedl (1994), “GroupLens: An Open Architecture for Collaborative Filtering of Netnews,” Proceedings of ACM 1994 Conference on Computer SupportedCooperative Work, Chapel Hill, pp. 175-186.2. Joseph A. Konstan, Bradley N. Miller, David Maltz, Jonathan L. Herlocker, Lee R. Gordon, and John Riedl (1997), “GroupLens: Applying Collaborative Filtering to Usenet Nwes,” Comminications of the ACM, Mar1997, Vol. 40, Issue 3, pp. 77-87.3. Kwok-Wai Cheung, Kwok-Ching Tsui, and Jiming Liu (2004), “Extended Latent Class Models for Collaborative Recommendation,” IEEE Transactions on Systems, Man Cybernetics: Part A, Jan 2004, Vol. 34, Issue 1, pp. 143-148.4. Wenye Li, Kin-hong Lee, and Kwong-sak Leung (2006), “Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space,” Neural Information Processing Systems – NIPS, pp. 881-888.5. J. Bennet, and S. Lanning, “The Netflix Prize,” KDD Cup and Workshop, 2007;www.netflixprize.com.6. Daniel E. Ho, and Kevin M. Quinn (2008), “Improving the Presentation and Interpretation of Online Ratings Data with Model-Based Figures,” The Amreican Statistician, Nov 2008, Vol. 62, Issue 4, pp. 279-288.7. Martijn Kagie, Matthijs van der Loos, and Michiel van Wezel (2009), “Including item characteristics in the probabilistic latent semantic analysis model for collaborative filtering,” AI Communications, 22, 2009, pp. 249-265.8. Yehuda Koren, Robert Bell, and Chris Volinsky (2009), “Matrix Factorization Techniques for Recommender Systems,” IEEE Computer Society, Aug 2009, Vol.42, Issue 8, pp. 42-49.9. Yehuda Koren (2010), “Collaborative Filtering with Temporal Dynamics,” Comminications of the ACM, APR 2010, Vol. 53, Issue 4, pp. 89-98.10. 張孫浩 (2011), 網路評比資料之統計分析, 國立政治大學統計學系碩士論文11. Netflix. Retrieved JUN, 2013, from http://www.netflix.com12. Amazon. Retrieved JUN, 2013, from http://www.amazon.com13. TiVo. Retrieved JUN, 2013, from http://www.tgc-taiwan.com.tw/index.php14. GroupLens Research. Retrieved Nov, 2013, from http://www.grouplens.org zh_TW