學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 矩陣分解法對網路評比資料分析之探討
Matrix Factorization Techniques for Analysis of Online Rating Data
作者 張良卉
貢獻者 翁久幸
Ruby Chui-Hsing Weng
張良卉
關鍵詞 推薦系統
合作式推薦系統
潛在因素模型
矩陣分解法
日期 2012
上傳時間 1-Jul-2013 17:01:54 (UTC+8)
摘要   隨著科技的進步、網路的發達,我們生活在資訊爆炸的社會。許多企業網站或網路商店在將產品銷售給消費者的過程中,紛紛使用了推薦系統,商家使用網路作為行銷的手法,消費者也會透過網路尋找自己想要的產品,推薦系統就在這個環境中產生。

  推薦系統根據使用者的特性或喜好,將使用者可能會喜歡的資訊或實物推薦給使用者。推薦系統的運作方式分成兩大類,第一類是內容導向式推薦系統(content filtering approach),對所有項目賦予一連串的屬性,再依照使用者的個人資料和項目屬性做比對,藉此推薦較符合該位使用者喜好的項目。第二類是合作式推薦系統(collaborative filtering approach),此方法利用的是使用者彼此之間或是項目與項目之間的關係,其概念是:與使用者有類似喜好的人所喜歡的東西或是與使用者喜好的東西相似的物品,都可能是使用者也會喜歡的項目。其方便之處在於只需要使用者對項目的評分資料即可運作。
  合作式推薦系統又包含兩個領域,分別為鄰域法(neighborhood methods)和潛在因素模型(latent factor models)。鄰域法著重於使用者之間或項目之間的關係,根據所利用不同的關係又可分為項目導向與使用者導向兩種方法。潛在因素模型則試著從項目和使用者的評分關係中找出有何潛在因素,矩陣分解法(matrix factorization)即為其方法之一。

  本研究旨在探討矩陣分解法,矩陣分解法為合作式推薦系統中的潛在因素模型方法之一。矩陣分解法利用使用者對於項目特性的喜好和項目所具備的特性作為預測評分的準則,但會影響使用者評分的因素不只這些,可能還包括了使用者自己的衡量準則、或是項目本身的優良程度,因此可以將這些導致偏差的因素加入矩陣分解法的模型中來調整預測的結果。因此本研究想探討在加入偏差項以後的矩陣分解法是否能使預測的結果更準確。

  本研究使用來自Minnesota 大學GroupLens Research Project 中的MovieLens 資料來分析。在經過實證分析以後,我們發現加入偏差項的矩陣分解法比起矩陣分解法在預測上確實能夠提升預測的準確度,但會花較多的時間在運算上。
The explosive growth of the internet has led to information overload. Electronic retailers and content providers use recommender systems to meet a variety of special needs and tastes. The retailers use the internet as a marketing method, and the consumers use the internet to find the products they want. Recommender systems then appear. Such systems are particularly useful for entertainment products such as movies, music, and TV shows.

Recommender systems recommend the products or the information users may like to them by their characteristic and preference. Recommender systems can be divided to two strategies. One is content filtering approach, which creates a profile for each user or product to characterize its nature. Another is collaborative filtering approach, which relies only on past user behavior without requiring the creation of explicit profiles. Collaborative filtering analyzes relationships between users and interdependencies among products to identify new user-item associations.

The two primary areas of collaborative filtering are the neighborhood methods and latent factor models. Neighborhood methods are centered on computing the relationships between items or, alternatively, between users. Latent factor models are an alternative approach that tries to explain the ratings by characterizing both items and users on factors inferred from the ratings patterns. Matrix factorization techniques are some of the most successful realizations of latent factor models.

One benefit of the matrix factorization approach to collaborative filtering is its flexibility in dealing with various data aspects and other application-specific
requirements. It tries to capture the interactions between users and items that produce the different rating values. However, much of the observed variation in rating values is due to effects associated with either users or items, known as biases or intercepts, independent of any interactions. This research try to find out whether putting the biases into matrix factorization models makes the prediction more accurate.

This research analyzed the MovieLens data from GroupLens Research Project of Minnesota University. We found that adding biasterms to matrix factorization can improve the accuracy of prediction, though it requires a bit more computing time.
參考文獻 1. Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, and John Riedl (1994), “GroupLens: An Open Architecture for Collaborative Filtering of Netnews,” Proceedings of ACM 1994 Conference on Computer Supported
Cooperative Work, Chapel Hill, pp. 175-186.

2. Joseph A. Konstan, Bradley N. Miller, David Maltz, Jonathan L. Herlocker, Lee R. Gordon, and John Riedl (1997), “GroupLens: Applying Collaborative Filtering to Usenet Nwes,” Comminications of the ACM, Mar1997, Vol. 40, Issue 3, pp. 77-87.

3. Kwok-Wai Cheung, Kwok-Ching Tsui, and Jiming Liu (2004), “Extended Latent Class Models for Collaborative Recommendation,” IEEE Transactions on Systems, Man Cybernetics: Part A, Jan 2004, Vol. 34, Issue 1, pp. 143-148.

4. Wenye Li, Kin-hong Lee, and Kwong-sak Leung (2006), “Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space,” Neural Information Processing Systems – NIPS, pp. 881-888.

5. J. Bennet, and S. Lanning, “The Netflix Prize,” KDD Cup and Workshop, 2007;www.netflixprize.com.

6. Daniel E. Ho, and Kevin M. Quinn (2008), “Improving the Presentation and Interpretation of Online Ratings Data with Model-Based Figures,” The Amreican Statistician, Nov 2008, Vol. 62, Issue 4, pp. 279-288.

7. Martijn Kagie, Matthijs van der Loos, and Michiel van Wezel (2009), “Including item characteristics in the probabilistic latent semantic analysis model for collaborative filtering,” AI Communications, 22, 2009, pp. 249-265.

8. Yehuda Koren, Robert Bell, and Chris Volinsky (2009), “Matrix Factorization Techniques for Recommender Systems,” IEEE Computer Society, Aug 2009, Vol.42, Issue 8, pp. 42-49.

9. Yehuda Koren (2010), “Collaborative Filtering with Temporal Dynamics,” Comminications of the ACM, APR 2010, Vol. 53, Issue 4, pp. 89-98.

10. 張孫浩 (2011), 網路評比資料之統計分析, 國立政治大學統計學系碩士論文

11. Netflix. Retrieved JUN, 2013, from http://www.netflix.com

12. Amazon. Retrieved JUN, 2013, from http://www.amazon.com
13. TiVo. Retrieved JUN, 2013, from http://www.tgc-taiwan.com.tw/index.php

14. GroupLens Research. Retrieved Nov, 2013, from http://www.grouplens.org
描述 碩士
國立政治大學
統計研究所
100354028
101
資料來源 http://thesis.lib.nccu.edu.tw/record/#G1003540281
資料類型 thesis
dc.contributor.advisor 翁久幸zh_TW
dc.contributor.advisor Ruby Chui-Hsing Wengen_US
dc.contributor.author (Authors) 張良卉zh_TW
dc.creator (作者) 張良卉zh_TW
dc.date (日期) 2012en_US
dc.date.accessioned 1-Jul-2013 17:01:54 (UTC+8)-
dc.date.available 1-Jul-2013 17:01:54 (UTC+8)-
dc.date.issued (上傳時間) 1-Jul-2013 17:01:54 (UTC+8)-
dc.identifier (Other Identifiers) G1003540281en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/58670-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計研究所zh_TW
dc.description (描述) 100354028zh_TW
dc.description (描述) 101zh_TW
dc.description.abstract (摘要)   隨著科技的進步、網路的發達,我們生活在資訊爆炸的社會。許多企業網站或網路商店在將產品銷售給消費者的過程中,紛紛使用了推薦系統,商家使用網路作為行銷的手法,消費者也會透過網路尋找自己想要的產品,推薦系統就在這個環境中產生。

  推薦系統根據使用者的特性或喜好,將使用者可能會喜歡的資訊或實物推薦給使用者。推薦系統的運作方式分成兩大類,第一類是內容導向式推薦系統(content filtering approach),對所有項目賦予一連串的屬性,再依照使用者的個人資料和項目屬性做比對,藉此推薦較符合該位使用者喜好的項目。第二類是合作式推薦系統(collaborative filtering approach),此方法利用的是使用者彼此之間或是項目與項目之間的關係,其概念是:與使用者有類似喜好的人所喜歡的東西或是與使用者喜好的東西相似的物品,都可能是使用者也會喜歡的項目。其方便之處在於只需要使用者對項目的評分資料即可運作。
  合作式推薦系統又包含兩個領域,分別為鄰域法(neighborhood methods)和潛在因素模型(latent factor models)。鄰域法著重於使用者之間或項目之間的關係,根據所利用不同的關係又可分為項目導向與使用者導向兩種方法。潛在因素模型則試著從項目和使用者的評分關係中找出有何潛在因素,矩陣分解法(matrix factorization)即為其方法之一。

  本研究旨在探討矩陣分解法,矩陣分解法為合作式推薦系統中的潛在因素模型方法之一。矩陣分解法利用使用者對於項目特性的喜好和項目所具備的特性作為預測評分的準則,但會影響使用者評分的因素不只這些,可能還包括了使用者自己的衡量準則、或是項目本身的優良程度,因此可以將這些導致偏差的因素加入矩陣分解法的模型中來調整預測的結果。因此本研究想探討在加入偏差項以後的矩陣分解法是否能使預測的結果更準確。

  本研究使用來自Minnesota 大學GroupLens Research Project 中的MovieLens 資料來分析。在經過實證分析以後,我們發現加入偏差項的矩陣分解法比起矩陣分解法在預測上確實能夠提升預測的準確度,但會花較多的時間在運算上。
zh_TW
dc.description.abstract (摘要) The explosive growth of the internet has led to information overload. Electronic retailers and content providers use recommender systems to meet a variety of special needs and tastes. The retailers use the internet as a marketing method, and the consumers use the internet to find the products they want. Recommender systems then appear. Such systems are particularly useful for entertainment products such as movies, music, and TV shows.

Recommender systems recommend the products or the information users may like to them by their characteristic and preference. Recommender systems can be divided to two strategies. One is content filtering approach, which creates a profile for each user or product to characterize its nature. Another is collaborative filtering approach, which relies only on past user behavior without requiring the creation of explicit profiles. Collaborative filtering analyzes relationships between users and interdependencies among products to identify new user-item associations.

The two primary areas of collaborative filtering are the neighborhood methods and latent factor models. Neighborhood methods are centered on computing the relationships between items or, alternatively, between users. Latent factor models are an alternative approach that tries to explain the ratings by characterizing both items and users on factors inferred from the ratings patterns. Matrix factorization techniques are some of the most successful realizations of latent factor models.

One benefit of the matrix factorization approach to collaborative filtering is its flexibility in dealing with various data aspects and other application-specific
requirements. It tries to capture the interactions between users and items that produce the different rating values. However, much of the observed variation in rating values is due to effects associated with either users or items, known as biases or intercepts, independent of any interactions. This research try to find out whether putting the biases into matrix factorization models makes the prediction more accurate.

This research analyzed the MovieLens data from GroupLens Research Project of Minnesota University. We found that adding biasterms to matrix factorization can improve the accuracy of prediction, though it requires a bit more computing time.
en_US
dc.description.tableofcontents 1 緒論 5
1.1 研究背景﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 5
1.1.1 推薦系統簡介﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 5
1.2 研究目的﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 6
2 文獻回顧 7
2.1 推薦系統的運作﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 7
2.2 合作式推薦系統﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 7
2.2.1 鄰域法﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 8
2.2.2 潛在因素模型﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 10
2.3 以IRT 模型分析評比資料的方法﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 10
3 研究方法 13
3.1 矩陣分解法﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 13
3.2 加入偏差項的矩陣分解法﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 17
4 實證研究 23
4.1 實證資料﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 23
4.2 矩陣分解法分析﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 24
4.2.1 矩陣分解法預測結果﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 24
4.2.2 加入偏差項的矩陣分解法預測結果﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 26
4.3 預測結果比較﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒﹒ 28
5 結論與建議 29

參考文獻及相關書目 30
zh_TW
dc.format.extent 947281 bytes-
dc.format.mimetype application/pdf-
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G1003540281en_US
dc.subject (關鍵詞) 推薦系統zh_TW
dc.subject (關鍵詞) 合作式推薦系統zh_TW
dc.subject (關鍵詞) 潛在因素模型zh_TW
dc.subject (關鍵詞) 矩陣分解法zh_TW
dc.title (題名) 矩陣分解法對網路評比資料分析之探討zh_TW
dc.title (題名) Matrix Factorization Techniques for Analysis of Online Rating Dataen_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) 1. Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, and John Riedl (1994), “GroupLens: An Open Architecture for Collaborative Filtering of Netnews,” Proceedings of ACM 1994 Conference on Computer Supported
Cooperative Work, Chapel Hill, pp. 175-186.

2. Joseph A. Konstan, Bradley N. Miller, David Maltz, Jonathan L. Herlocker, Lee R. Gordon, and John Riedl (1997), “GroupLens: Applying Collaborative Filtering to Usenet Nwes,” Comminications of the ACM, Mar1997, Vol. 40, Issue 3, pp. 77-87.

3. Kwok-Wai Cheung, Kwok-Ching Tsui, and Jiming Liu (2004), “Extended Latent Class Models for Collaborative Recommendation,” IEEE Transactions on Systems, Man Cybernetics: Part A, Jan 2004, Vol. 34, Issue 1, pp. 143-148.

4. Wenye Li, Kin-hong Lee, and Kwong-sak Leung (2006), “Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space,” Neural Information Processing Systems – NIPS, pp. 881-888.

5. J. Bennet, and S. Lanning, “The Netflix Prize,” KDD Cup and Workshop, 2007;www.netflixprize.com.

6. Daniel E. Ho, and Kevin M. Quinn (2008), “Improving the Presentation and Interpretation of Online Ratings Data with Model-Based Figures,” The Amreican Statistician, Nov 2008, Vol. 62, Issue 4, pp. 279-288.

7. Martijn Kagie, Matthijs van der Loos, and Michiel van Wezel (2009), “Including item characteristics in the probabilistic latent semantic analysis model for collaborative filtering,” AI Communications, 22, 2009, pp. 249-265.

8. Yehuda Koren, Robert Bell, and Chris Volinsky (2009), “Matrix Factorization Techniques for Recommender Systems,” IEEE Computer Society, Aug 2009, Vol.42, Issue 8, pp. 42-49.

9. Yehuda Koren (2010), “Collaborative Filtering with Temporal Dynamics,” Comminications of the ACM, APR 2010, Vol. 53, Issue 4, pp. 89-98.

10. 張孫浩 (2011), 網路評比資料之統計分析, 國立政治大學統計學系碩士論文

11. Netflix. Retrieved JUN, 2013, from http://www.netflix.com

12. Amazon. Retrieved JUN, 2013, from http://www.amazon.com
13. TiVo. Retrieved JUN, 2013, from http://www.tgc-taiwan.com.tw/index.php

14. GroupLens Research. Retrieved Nov, 2013, from http://www.grouplens.org
zh_TW