多重插補法在線上使用者評分之應用

Publications-Theses

Article View/Open

pdf(580)

Publication Export

Google Scholar^TM

題名	多重插補法在線上使用者評分之應用 Managing online user-generated product reviews using multiple imputation methods
作者	李岑志 Li, Cen Jhih
貢獻者	唐揆<br>鄭宗記 Tang, Kwei<br>Cheng, Tsung Chi 李岑志 Li, Cen Jhih
關鍵詞	意見探勘遺漏值多重插補 Opinion mining Missing data Multiple imputation
日期	2017
上傳時間	31-Jul-2017 10:57:07 (UTC+8)
摘要	隨著網路普及，人們越來越常在網路上購物並在線上評價商品，產生了非常大的口碑效應。不論對廠商或對消費者來說，線上商品評論都已經變得非常重要；消費者能藉由他人購買經驗判斷產品優劣，廠商能藉由消費者評價來提升產品品質，目前已有許多電子商務網站都有蒐集消費者購買產品後的意見回饋。這些網站中有些提供消費者能對產品打一個總分並寫一段文字評論，然而每個消費者所評論的產品特徵通常各有不同，尤其是較晚購買的消費者更可能因為自己的意見已經有人提過而省略。將每個人提到的文字敘述量化為數字分數時，沒有寫到的特徵將會使量化後的資料存在許多遺漏值。同時消費者也有可能提到一些不重要的特徵，若能找到消費者評論中，各個特徵影響消費者的多寡，廠商就能針對產品較重要的缺點改進。本研究將會著重探討消費者所提到的特徵對產品總分的影響，以及這些遺漏值填補後是否能接近消費者真實意見。過去許多填補遺漏值的方法都是一次填補全部資料，並沒有考慮消費者會受到時間較早的評論影響。本研究設計一套多重插補的方法並透過模擬驗證，以之填補亞馬遜網站的Canon 系列 SX210、SX230、SX260等三個世代數位相機之消費者評論資料。研究結果指出此方法能夠準確估計各項特徵對產品總分的影響。 Online user-generated product reviews have become a rich source of product quality information for both producers and customers. As a result, many E-commerce websites allow customers to rate products using scores, and some together with text comments. However, people usually comment only on the features they care about and might omit those have been mentioned by previous customers. Consequently, missing data occur when analyzing comments. In addition, customers may comment the features which influence neither their satisfaction nor sales volume. Thus, it is important to find the significant features so that manufacturers can improve the main defects. Our research focuses on modeling customer reviews and their influence on predicting overall ratings. We aim to understand whether, by filling up missing values, the critical features can be identified and the features rating authentically reflect customer opinion. Many previous studies fill whole the dataset, but not consider that customer reviews might be influenced by the foregoing reviews. We propose a method based on multiple imputation and fill the costumer reviews of Canon digital camera (SX210, SX230, SX260 generations) on Amazon. We design a simulation to verify the method’s effectiveness and the method get a great result on identifying the critical features.
參考文獻	Aggarwal, C. C. (2016). Recommender Systems, New York: Springer. Androdge, R. R. and Little, R. J. A. (2010). A Review of Hot Deck Imputation for Survey Non-response, International Statistical Review, 78(1), 40-64. Atkinson, A. C. and T.-C. Cheng (2000). On Robust Linear Regression with Incomplete Data, Computational Statistics and Data Analysis, 33, 361-380. Azur, M. J., E. A. Stuart, C. Frangakis, and P. J. Leaf, (2011).Multiple Imputation by Chained Equations: What is it and how does it work? International Journal of Methods in Psychiatric Research, 20(1), 40–49. Dror, G., Koenigstein, N., Koren, Y., & Weimer, M. (2011). The yahoo! music dataset and kdd-cup`11. In Proceedings of the 2011 International Conference on KDD Cup 2011-Volume 18, 3-18. Duric, A. and F. Song (2011). Feature selection for sentiment analysis based on content and syntax models, Decision Support Systems, 53, 704–711. Heckerman, D., D. M. Chickering, C. Meek, R. Rounthwaite, and C. Kadie (2001). Dependency Networks for Inference, Collaborative Filtering, and Data Visualisation, Journal of Machine Learning Research, 1, 49–75. Hennig-Thurau, T., K. P. Gwinner, G. Walsh, and D. D. Gremler (2004). Electronic Word-of-Mouth via Consumer-Opinion Platforms: What Motivates Consumers to Articulate Themselves on the Internet? Journal of Interactive Marketing, 18(1), 38–52. Horrigan, J. A. (2008). Online shopping. Pew Internet and American Life Project Report, 36. Hu, Y., Zhang, D., Ye, J., Li, X., & He, X. (2013). Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(9), 2117-2130. Lin, P.-Y. (2013). Latent Opinion Extraction: Identify Critical Product Features in Multiple Generations. Unpublished master’s thesis. National Chengchi University MBA Program. Taipei, Taiwan. Available at http://thesis.lib.nccu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dallcdr&s=id=%22G0100355026%22.&searchmode=basic Lipsitz, S. R., M. Parzen, and L.-P. Zhao (2002). A Degrees-of-Freedom Approximation in Multiple Imputation, Journal of Statistical Computation and Simulation, 72(4), 309-318. Little, R. J. A. (1979). Maximum likelihood inference for multiple regression with missing values: a simulation study, Journal of the Royal Statistical Society Series B. Statistical Methodology, 44, 226-233. Little, R.J.A., D. B. Rubin (2002). Statistical analysis with missing data, 2nd edition, New Jersey: Wiley. Pradel, B., N. Usunier, and P. Gallinari (2012). Ranking With Non-Random Missing Ratings: Inﬂuence of Popularity and Positivity on Evaluation Metrics. Raghunathan, T. E., P. W. Solenberger, and J. Van-Hoewyk (2002), IVEware: Imputation and Variance Estimation Software, available at http://www.isr.umich.edu/src/smp/ive/ Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys, New York : John Wiley & Sons. Rubin, D.B. (1996). Multiple Imputation after 18+ Years, Journal of the American Statistical Association, 91(434), 473–489. Shih, W.J., Weisberg, S., 1986. Assessing in uence in multiple linear regression with incomplete data, Technometrics 28, 231–239. Sridhar, S. and R. Srinivasan (2012). Social influence effects in online product ratings, Journal of Marketing, 76(5), 70-88. Steck, H. (2010). Training and testing of recommender systems on data missing not at random, Proc. 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’10), 713-722. Steck, H. (2011). Item popularity and recommendation accuracy. In Proceedings of the fifth ACM conference on Recommender systems, 125-132. Van Buuren, S. and K. Groothuis-Oudshoorn, (2011). mice: Multivariate Imputation by Chained Equations in R, Journal of Statistical Software, 45(3), 1-67. Also available at http://www.jstatsoft.org/v45/i03/ Yang, X., Steck, H., Guo, Y., & Liu, Y. (2012). On top-k recommendation using social networks. In Proceedings of the sixth ACM conference on Recommender systems, 67-74.
描述	碩士國立政治大學統計學系 104354014
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0104354014
資料類型	thesis

dc.contributor.advisor	唐揆<br>鄭宗記	zh_TW
dc.contributor.advisor	Tang, Kwei<br>Cheng, Tsung Chi	en_US
dc.contributor.author (Authors)	李岑志	zh_TW
dc.contributor.author (Authors)	Li, Cen Jhih	en_US
dc.creator (作者)	李岑志	zh_TW
dc.creator (作者)	Li, Cen Jhih	en_US
dc.date (日期)	2017	en_US
dc.date.accessioned	31-Jul-2017 10:57:07 (UTC+8)	-
dc.date.available	31-Jul-2017 10:57:07 (UTC+8)	-
dc.date.issued (上傳時間)	31-Jul-2017 10:57:07 (UTC+8)	-
dc.identifier (Other Identifiers)	G0104354014	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/111445	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計學系	zh_TW
dc.description (描述)	104354014	zh_TW
dc.description.abstract (摘要)	隨著網路普及，人們越來越常在網路上購物並在線上評價商品，產生了非常大的口碑效應。不論對廠商或對消費者來說，線上商品評論都已經變得非常重要；消費者能藉由他人購買經驗判斷產品優劣，廠商能藉由消費者評價來提升產品品質，目前已有許多電子商務網站都有蒐集消費者購買產品後的意見回饋。這些網站中有些提供消費者能對產品打一個總分並寫一段文字評論，然而每個消費者所評論的產品特徵通常各有不同，尤其是較晚購買的消費者更可能因為自己的意見已經有人提過而省略。將每個人提到的文字敘述量化為數字分數時，沒有寫到的特徵將會使量化後的資料存在許多遺漏值。同時消費者也有可能提到一些不重要的特徵，若能找到消費者評論中，各個特徵影響消費者的多寡，廠商就能針對產品較重要的缺點改進。本研究將會著重探討消費者所提到的特徵對產品總分的影響，以及這些遺漏值填補後是否能接近消費者真實意見。過去許多填補遺漏值的方法都是一次填補全部資料，並沒有考慮消費者會受到時間較早的評論影響。本研究設計一套多重插補的方法並透過模擬驗證，以之填補亞馬遜網站的Canon 系列 SX210、SX230、SX260等三個世代數位相機之消費者評論資料。研究結果指出此方法能夠準確估計各項特徵對產品總分的影響。	zh_TW
dc.description.abstract (摘要)	Online user-generated product reviews have become a rich source of product quality information for both producers and customers. As a result, many E-commerce websites allow customers to rate products using scores, and some together with text comments. However, people usually comment only on the features they care about and might omit those have been mentioned by previous customers. Consequently, missing data occur when analyzing comments. In addition, customers may comment the features which influence neither their satisfaction nor sales volume. Thus, it is important to find the significant features so that manufacturers can improve the main defects. Our research focuses on modeling customer reviews and their influence on predicting overall ratings. We aim to understand whether, by filling up missing values, the critical features can be identified and the features rating authentically reflect customer opinion. Many previous studies fill whole the dataset, but not consider that customer reviews might be influenced by the foregoing reviews. We propose a method based on multiple imputation and fill the costumer reviews of Canon digital camera (SX210, SX230, SX260 generations) on Amazon. We design a simulation to verify the method’s effectiveness and the method get a great result on identifying the critical features.	en_US
dc.description.tableofcontents	第一章緒論 9 第一節研究背景 9 第二節研究目的 10 第三節論文架構 11 第二章文獻回顧 12 第一節遺漏值 12 第二節遺漏值的處理 13 第三節熱卡插補法 14 第四節眾數插補和單插補法之參數估計 15 第五節多重插補 16 第六節鏈式方程插補法 17 第七節資料蒐集 18 第三章電腦模擬研究分析 19 第一節模擬設計 19 3.1.1資料生成 20 3.1.2遺漏值生成 21 第二節熱卡多重插補設計 22 第三節模擬結果 24 3.3.1前5%為完整資料之結果 24 3.3.2改變模型(1)參數之結果 35 3.3.3前5%有遺漏資料之結果 39 第四章實際資料 42 第一節資料描述與產品介紹 42 第二節填補結果 46 4.2.1 SX210之填補結果與迴歸估計式 46 4.2.2 SX230之填補結果與迴歸估計式 48 4.2.3 SX260之填補結果與迴歸估計式 50 4.2.4三產品填補結果總結 51 第五章結論 54	zh_TW
dc.format.extent	1608671 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0104354014	en_US
dc.subject (關鍵詞)	意見探勘	zh_TW
dc.subject (關鍵詞)	遺漏值	zh_TW
dc.subject (關鍵詞)	多重插補	zh_TW
dc.subject (關鍵詞)	Opinion mining	en_US
dc.subject (關鍵詞)	Missing data	en_US
dc.subject (關鍵詞)	Multiple imputation	en_US
dc.title (題名)	多重插補法在線上使用者評分之應用	zh_TW
dc.title (題名)	Managing online user-generated product reviews using multiple imputation methods	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Aggarwal, C. C. (2016). Recommender Systems, New York: Springer. Androdge, R. R. and Little, R. J. A. (2010). A Review of Hot Deck Imputation for Survey Non-response, International Statistical Review, 78(1), 40-64. Atkinson, A. C. and T.-C. Cheng (2000). On Robust Linear Regression with Incomplete Data, Computational Statistics and Data Analysis, 33, 361-380. Azur, M. J., E. A. Stuart, C. Frangakis, and P. J. Leaf, (2011).Multiple Imputation by Chained Equations: What is it and how does it work? International Journal of Methods in Psychiatric Research, 20(1), 40–49. Dror, G., Koenigstein, N., Koren, Y., & Weimer, M. (2011). The yahoo! music dataset and kdd-cup`11. In Proceedings of the 2011 International Conference on KDD Cup 2011-Volume 18, 3-18. Duric, A. and F. Song (2011). Feature selection for sentiment analysis based on content and syntax models, Decision Support Systems, 53, 704–711. Heckerman, D., D. M. Chickering, C. Meek, R. Rounthwaite, and C. Kadie (2001). Dependency Networks for Inference, Collaborative Filtering, and Data Visualisation, Journal of Machine Learning Research, 1, 49–75. Hennig-Thurau, T., K. P. Gwinner, G. Walsh, and D. D. Gremler (2004). Electronic Word-of-Mouth via Consumer-Opinion Platforms: What Motivates Consumers to Articulate Themselves on the Internet? Journal of Interactive Marketing, 18(1), 38–52. Horrigan, J. A. (2008). Online shopping. Pew Internet and American Life Project Report, 36. Hu, Y., Zhang, D., Ye, J., Li, X., & He, X. (2013). Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(9), 2117-2130. Lin, P.-Y. (2013). Latent Opinion Extraction: Identify Critical Product Features in Multiple Generations. Unpublished master’s thesis. National Chengchi University MBA Program. Taipei, Taiwan. Available at http://thesis.lib.nccu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dallcdr&s=id=%22G0100355026%22.&searchmode=basic Lipsitz, S. R., M. Parzen, and L.-P. Zhao (2002). A Degrees-of-Freedom Approximation in Multiple Imputation, Journal of Statistical Computation and Simulation, 72(4), 309-318. Little, R. J. A. (1979). Maximum likelihood inference for multiple regression with missing values: a simulation study, Journal of the Royal Statistical Society Series B. Statistical Methodology, 44, 226-233. Little, R.J.A., D. B. Rubin (2002). Statistical analysis with missing data, 2nd edition, New Jersey: Wiley. Pradel, B., N. Usunier, and P. Gallinari (2012). Ranking With Non-Random Missing Ratings: Inﬂuence of Popularity and Positivity on Evaluation Metrics. Raghunathan, T. E., P. W. Solenberger, and J. Van-Hoewyk (2002), IVEware: Imputation and Variance Estimation Software, available at http://www.isr.umich.edu/src/smp/ive/ Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys, New York : John Wiley & Sons. Rubin, D.B. (1996). Multiple Imputation after 18+ Years, Journal of the American Statistical Association, 91(434), 473–489. Shih, W.J., Weisberg, S., 1986. Assessing in uence in multiple linear regression with incomplete data, Technometrics 28, 231–239. Sridhar, S. and R. Srinivasan (2012). Social influence effects in online product ratings, Journal of Marketing, 76(5), 70-88. Steck, H. (2010). Training and testing of recommender systems on data missing not at random, Proc. 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’10), 713-722. Steck, H. (2011). Item popularity and recommendation accuracy. In Proceedings of the fifth ACM conference on Recommender systems, 125-132. Van Buuren, S. and K. Groothuis-Oudshoorn, (2011). mice: Multivariate Imputation by Chained Equations in R, Journal of Statistical Software, 45(3), 1-67. Also available at http://www.jstatsoft.org/v45/i03/ Yang, X., Steck, H., Guo, Y., & Liu, Y. (2012). On top-k recommendation using social networks. In Proceedings of the sixth ACM conference on Recommender systems, 67-74.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM