學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 多重插補法在線上使用者評分之應用
Managing online user-generated product reviews using multiple imputation methods
作者 李岑志
Li, Cen Jhih
貢獻者 唐揆<br>鄭宗記
Tang, Kwei<br>Cheng, Tsung Chi
李岑志
Li, Cen Jhih
關鍵詞 意見探勘
遺漏值
多重插補
Opinion mining
Missing data
Multiple imputation
日期 2017
上傳時間 31-Jul-2017 10:57:07 (UTC+8)
摘要 隨著網路普及,人們越來越常在網路上購物並在線上評價商品,產生了非常大的口碑效應。不論對廠商或對消費者來說,線上商品評論都已經變得非常重要;消費者能藉由他人購買經驗判斷產品優劣,廠商能藉由消費者評價來提升產品品質,目前已有許多電子商務網站都有蒐集消費者購買產品後的意見回饋。
這些網站中有些提供消費者能對產品打一個總分並寫一段文字評論,然而每個消費者所評論的產品特徵通常各有不同,尤其是較晚購買的消費者更可能因為自己的意見已經有人提過而省略。將每個人提到的文字敘述量化為數字分數時,沒有寫到的特徵將會使量化後的資料存在許多遺漏值。
同時消費者也有可能提到一些不重要的特徵,若能找到消費者評論中,各個特徵影響消費者的多寡,廠商就能針對產品較重要的缺點改進。本研究將會著重探討消費者所提到的特徵對產品總分的影響,以及這些遺漏值填補後是否能接近消費者真實意見。
過去許多填補遺漏值的方法都是一次填補全部資料,並沒有考慮消費者會受到時間較早的評論影響。本研究設計一套多重插補的方法並透過模擬驗證,以之填補亞馬遜網站的Canon 系列 SX210、SX230、SX260等三個世代數位相機之消費者評論資料。研究結果指出此方法能夠準確估計各項特徵對產品總分的影響。
Online user-generated product reviews have become a rich source of product quality information for both producers and customers. As a result, many E-commerce websites allow customers to rate products using scores, and some together with text comments. However, people usually comment only on the features they care about and might omit those have been mentioned by previous customers. Consequently, missing data occur when analyzing comments.
In addition, customers may comment the features which influence neither their satisfaction nor sales volume. Thus, it is important to find the significant features so that manufacturers can improve the main defects. Our research focuses on modeling customer reviews and their influence on predicting overall ratings. We aim to understand whether, by filling up missing values, the critical features can be identified and the features rating authentically reflect customer opinion.
Many previous studies fill whole the dataset, but not consider that customer reviews might be influenced by the foregoing reviews. We propose a method based on multiple imputation and fill the costumer reviews of Canon digital camera (SX210, SX230, SX260 generations) on Amazon. We design a simulation to verify the method’s effectiveness and the method get a great result on identifying the critical features.
參考文獻 Aggarwal, C. C. (2016). Recommender Systems, New York: Springer.
Androdge, R. R. and Little, R. J. A. (2010). A Review of Hot Deck Imputation for Survey Non-response, International Statistical Review, 78(1), 40-64.
Atkinson, A. C. and T.-C. Cheng (2000). On Robust Linear Regression with Incomplete Data, Computational Statistics and Data Analysis, 33, 361-380.
Azur, M. J., E. A. Stuart, C. Frangakis, and P. J. Leaf, (2011).Multiple Imputation by Chained Equations: What is it and how does it work? International Journal of Methods in Psychiatric Research, 20(1), 40–49.
Dror, G., Koenigstein, N., Koren, Y., & Weimer, M. (2011). The yahoo! music dataset and kdd-cup`11. In Proceedings of the 2011 International Conference on KDD Cup 2011-Volume 18, 3-18.
Duric, A. and F. Song (2011). Feature selection for sentiment analysis based on content and syntax models, Decision Support Systems, 53, 704–711.
Heckerman, D., D. M. Chickering, C. Meek, R. Rounthwaite, and C. Kadie (2001). Dependency Networks for Inference, Collaborative Filtering, and Data Visualisation, Journal of Machine Learning Research, 1, 49–75.
Hennig-Thurau, T., K. P. Gwinner, G. Walsh, and D. D. Gremler (2004). Electronic Word-of-Mouth via Consumer-Opinion Platforms: What Motivates Consumers to Articulate Themselves on the Internet? Journal of Interactive Marketing, 18(1), 38–52.
Horrigan, J. A. (2008). Online shopping. Pew Internet and American Life Project Report, 36.
Hu, Y., Zhang, D., Ye, J., Li, X., & He, X. (2013). Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(9), 2117-2130.
Lin, P.-Y. (2013). Latent Opinion Extraction: Identify Critical Product Features in Multiple Generations. Unpublished master’s thesis. National Chengchi University MBA Program. Taipei, Taiwan. Available at http://thesis.lib.nccu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dallcdr&s=id=%22G0100355026%22.&searchmode=basic
Lipsitz, S. R., M. Parzen, and L.-P. Zhao (2002). A Degrees-of-Freedom Approximation in Multiple Imputation, Journal of Statistical Computation and Simulation, 72(4), 309-318.
Little, R. J. A. (1979). Maximum likelihood inference for multiple regression with missing values: a simulation study, Journal of the Royal Statistical Society Series B. Statistical Methodology, 44, 226-233.
Little, R.J.A., D. B. Rubin (2002). Statistical analysis with missing data, 2nd edition, New Jersey: Wiley.
Pradel, B., N. Usunier, and P. Gallinari (2012). Ranking With Non-Random Missing Ratings: Influence of Popularity and Positivity on Evaluation Metrics.
Raghunathan, T. E., P. W. Solenberger, and J. Van-Hoewyk (2002), IVEware: Imputation and Variance Estimation Software, available at http://www.isr.umich.edu/src/smp/ive/
Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys, New York : John Wiley & Sons.
Rubin, D.B. (1996). Multiple Imputation after 18+ Years, Journal of the American Statistical Association, 91(434), 473–489.
Shih, W.J., Weisberg, S., 1986. Assessing in uence in multiple linear regression with incomplete data, Technometrics 28, 231–239.
Sridhar, S. and R. Srinivasan (2012). Social influence effects in online product ratings, Journal of Marketing, 76(5), 70-88.
Steck, H. (2010). Training and testing of recommender systems on data missing not at random, Proc. 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’10), 713-722.
Steck, H. (2011). Item popularity and recommendation accuracy. In Proceedings of the fifth ACM conference on Recommender systems, 125-132.
Van Buuren, S. and K. Groothuis-Oudshoorn, (2011). mice: Multivariate Imputation by Chained Equations in R, Journal of Statistical Software, 45(3), 1-67. Also available at http://www.jstatsoft.org/v45/i03/
Yang, X., Steck, H., Guo, Y., & Liu, Y. (2012). On top-k recommendation using social networks. In Proceedings of the sixth ACM conference on Recommender systems, 67-74.
描述 碩士
國立政治大學
統計學系
104354014
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0104354014
資料類型 thesis
dc.contributor.advisor 唐揆<br>鄭宗記zh_TW
dc.contributor.advisor Tang, Kwei<br>Cheng, Tsung Chien_US
dc.contributor.author (Authors) 李岑志zh_TW
dc.contributor.author (Authors) Li, Cen Jhihen_US
dc.creator (作者) 李岑志zh_TW
dc.creator (作者) Li, Cen Jhihen_US
dc.date (日期) 2017en_US
dc.date.accessioned 31-Jul-2017 10:57:07 (UTC+8)-
dc.date.available 31-Jul-2017 10:57:07 (UTC+8)-
dc.date.issued (上傳時間) 31-Jul-2017 10:57:07 (UTC+8)-
dc.identifier (Other Identifiers) G0104354014en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/111445-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計學系zh_TW
dc.description (描述) 104354014zh_TW
dc.description.abstract (摘要) 隨著網路普及,人們越來越常在網路上購物並在線上評價商品,產生了非常大的口碑效應。不論對廠商或對消費者來說,線上商品評論都已經變得非常重要;消費者能藉由他人購買經驗判斷產品優劣,廠商能藉由消費者評價來提升產品品質,目前已有許多電子商務網站都有蒐集消費者購買產品後的意見回饋。
這些網站中有些提供消費者能對產品打一個總分並寫一段文字評論,然而每個消費者所評論的產品特徵通常各有不同,尤其是較晚購買的消費者更可能因為自己的意見已經有人提過而省略。將每個人提到的文字敘述量化為數字分數時,沒有寫到的特徵將會使量化後的資料存在許多遺漏值。
同時消費者也有可能提到一些不重要的特徵,若能找到消費者評論中,各個特徵影響消費者的多寡,廠商就能針對產品較重要的缺點改進。本研究將會著重探討消費者所提到的特徵對產品總分的影響,以及這些遺漏值填補後是否能接近消費者真實意見。
過去許多填補遺漏值的方法都是一次填補全部資料,並沒有考慮消費者會受到時間較早的評論影響。本研究設計一套多重插補的方法並透過模擬驗證,以之填補亞馬遜網站的Canon 系列 SX210、SX230、SX260等三個世代數位相機之消費者評論資料。研究結果指出此方法能夠準確估計各項特徵對產品總分的影響。
zh_TW
dc.description.abstract (摘要) Online user-generated product reviews have become a rich source of product quality information for both producers and customers. As a result, many E-commerce websites allow customers to rate products using scores, and some together with text comments. However, people usually comment only on the features they care about and might omit those have been mentioned by previous customers. Consequently, missing data occur when analyzing comments.
In addition, customers may comment the features which influence neither their satisfaction nor sales volume. Thus, it is important to find the significant features so that manufacturers can improve the main defects. Our research focuses on modeling customer reviews and their influence on predicting overall ratings. We aim to understand whether, by filling up missing values, the critical features can be identified and the features rating authentically reflect customer opinion.
Many previous studies fill whole the dataset, but not consider that customer reviews might be influenced by the foregoing reviews. We propose a method based on multiple imputation and fill the costumer reviews of Canon digital camera (SX210, SX230, SX260 generations) on Amazon. We design a simulation to verify the method’s effectiveness and the method get a great result on identifying the critical features.
en_US
dc.description.tableofcontents 第一章 緒論 9
第一節 研究背景 9
第二節 研究目的 10
第三節 論文架構 11
第二章 文獻回顧 12
第一節 遺漏值 12
第二節 遺漏值的處理 13
第三節 熱卡插補法 14
第四節 眾數插補和單插補法之參數估計 15
第五節 多重插補 16
第六節 鏈式方程插補法 17
第七節 資料蒐集 18
第三章 電腦模擬研究分析 19
第一節 模擬設計 19
3.1.1資料生成 20
3.1.2遺漏值生成 21
第二節 熱卡多重插補設計 22
第三節 模擬結果 24
3.3.1前5%為完整資料之結果 24
3.3.2改變模型(1)參數之結果 35
3.3.3前5%有遺漏資料之結果 39
第四章 實際資料 42
第一節 資料描述與產品介紹 42
第二節 填補結果 46
4.2.1 SX210之填補結果與迴歸估計式 46
4.2.2 SX230之填補結果與迴歸估計式 48
4.2.3 SX260之填補結果與迴歸估計式 50
4.2.4三產品填補結果總結 51
第五章 結論 54
zh_TW
dc.format.extent 1608671 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0104354014en_US
dc.subject (關鍵詞) 意見探勘zh_TW
dc.subject (關鍵詞) 遺漏值zh_TW
dc.subject (關鍵詞) 多重插補zh_TW
dc.subject (關鍵詞) Opinion miningen_US
dc.subject (關鍵詞) Missing dataen_US
dc.subject (關鍵詞) Multiple imputationen_US
dc.title (題名) 多重插補法在線上使用者評分之應用zh_TW
dc.title (題名) Managing online user-generated product reviews using multiple imputation methodsen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) Aggarwal, C. C. (2016). Recommender Systems, New York: Springer.
Androdge, R. R. and Little, R. J. A. (2010). A Review of Hot Deck Imputation for Survey Non-response, International Statistical Review, 78(1), 40-64.
Atkinson, A. C. and T.-C. Cheng (2000). On Robust Linear Regression with Incomplete Data, Computational Statistics and Data Analysis, 33, 361-380.
Azur, M. J., E. A. Stuart, C. Frangakis, and P. J. Leaf, (2011).Multiple Imputation by Chained Equations: What is it and how does it work? International Journal of Methods in Psychiatric Research, 20(1), 40–49.
Dror, G., Koenigstein, N., Koren, Y., & Weimer, M. (2011). The yahoo! music dataset and kdd-cup`11. In Proceedings of the 2011 International Conference on KDD Cup 2011-Volume 18, 3-18.
Duric, A. and F. Song (2011). Feature selection for sentiment analysis based on content and syntax models, Decision Support Systems, 53, 704–711.
Heckerman, D., D. M. Chickering, C. Meek, R. Rounthwaite, and C. Kadie (2001). Dependency Networks for Inference, Collaborative Filtering, and Data Visualisation, Journal of Machine Learning Research, 1, 49–75.
Hennig-Thurau, T., K. P. Gwinner, G. Walsh, and D. D. Gremler (2004). Electronic Word-of-Mouth via Consumer-Opinion Platforms: What Motivates Consumers to Articulate Themselves on the Internet? Journal of Interactive Marketing, 18(1), 38–52.
Horrigan, J. A. (2008). Online shopping. Pew Internet and American Life Project Report, 36.
Hu, Y., Zhang, D., Ye, J., Li, X., & He, X. (2013). Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(9), 2117-2130.
Lin, P.-Y. (2013). Latent Opinion Extraction: Identify Critical Product Features in Multiple Generations. Unpublished master’s thesis. National Chengchi University MBA Program. Taipei, Taiwan. Available at http://thesis.lib.nccu.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dallcdr&s=id=%22G0100355026%22.&searchmode=basic
Lipsitz, S. R., M. Parzen, and L.-P. Zhao (2002). A Degrees-of-Freedom Approximation in Multiple Imputation, Journal of Statistical Computation and Simulation, 72(4), 309-318.
Little, R. J. A. (1979). Maximum likelihood inference for multiple regression with missing values: a simulation study, Journal of the Royal Statistical Society Series B. Statistical Methodology, 44, 226-233.
Little, R.J.A., D. B. Rubin (2002). Statistical analysis with missing data, 2nd edition, New Jersey: Wiley.
Pradel, B., N. Usunier, and P. Gallinari (2012). Ranking With Non-Random Missing Ratings: Influence of Popularity and Positivity on Evaluation Metrics.
Raghunathan, T. E., P. W. Solenberger, and J. Van-Hoewyk (2002), IVEware: Imputation and Variance Estimation Software, available at http://www.isr.umich.edu/src/smp/ive/
Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys, New York : John Wiley & Sons.
Rubin, D.B. (1996). Multiple Imputation after 18+ Years, Journal of the American Statistical Association, 91(434), 473–489.
Shih, W.J., Weisberg, S., 1986. Assessing in uence in multiple linear regression with incomplete data, Technometrics 28, 231–239.
Sridhar, S. and R. Srinivasan (2012). Social influence effects in online product ratings, Journal of Marketing, 76(5), 70-88.
Steck, H. (2010). Training and testing of recommender systems on data missing not at random, Proc. 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’10), 713-722.
Steck, H. (2011). Item popularity and recommendation accuracy. In Proceedings of the fifth ACM conference on Recommender systems, 125-132.
Van Buuren, S. and K. Groothuis-Oudshoorn, (2011). mice: Multivariate Imputation by Chained Equations in R, Journal of Statistical Software, 45(3), 1-67. Also available at http://www.jstatsoft.org/v45/i03/
Yang, X., Steck, H., Guo, Y., & Liu, Y. (2012). On top-k recommendation using social networks. In Proceedings of the sixth ACM conference on Recommender systems, 67-74.
zh_TW