學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 使用評論文字改善轉換率預測
Improving Conversion Rate Prediction with Review Text
作者 許振楡
Hsu, Chen-Yu
貢獻者 翁久幸
Weng, Jiu-Xing
許振楡
Hsu, Chen-Yu
關鍵詞 轉換率預測
文字評論
機器學習
日期 2020
上傳時間 2-Sep-2020 11:43:53 (UTC+8)
摘要 隨著電商平台的出現,顧客消費習慣逐漸受到改變,「線上評論」成為左右消費者購買意願的重要因素,參考過去學者 Chevalier 和 Mayzlin [3]對此議題的探討,以銷售排名作為反應變數,建立迴歸模型觀察評論分數、其他特徵的顯著程度,並無直接從評論文字萃取特徵,本論文建立在 Chevalier 和 Mayzlin [3]所提出的特徵,研究加上評論文字資訊能否更有效的預測顧客消費行為,評論文字資訊以 TFIDF、CBOW、Skip-gram 詞嵌入向量為特徵。
本文以某旅遊電商平台評論資料集為主,研究分成三部分,第一部分使用機器學習方法以文字特徵預測評論分數,預測分數與實際分數相關係數介於 0.2 到0.4 之間。第二部分以轉換率為預測目標,第三部分預測下期轉換率漲跌,分別比較加入文字特徵與僅以分數、其他評論特徵所建模型是否有更好的預測效果,實驗結果顯示,在此資料集上不包含前期轉換率時預測轉換率及下期漲跌,加入文字特徵皆有變好,若含前期轉換率時則僅有小幅的提升。
With the showing of electronic commerce, consuming behavior has been changed.“Online Review”is an important factor that has big emphasis on customers’purchase intention. According to Chevalier and Mayzlin [3] s’research, they take sales number as response variable and build regression model to check the significance of score characteristics and other characteristics. However, they don’t consider the text review due to lack of natural language preprocessing methods. This research add review text information to see whether model has a better ability to predict customer behavior. We take two kinds of TFIDF、CBOW and Skip-gram as text characteristics.
Based on a traveling e-commerce review data, this research spit into three sections. In Section 1, predicting review score by using machine learning methods at first. In order to compare the difference between text characteristics and review score, we calculate the correlation of predicted score and original review score and get the result between 0.2 and 0.4. In section 2 and 3, our predict target is conversion rate and the trend of next week conversion rate, which go up, down or keep constant. We comparing model with text characteristics and without text characteristics to see whether text can bring useful information. Result shows that adding text characteristics truly can help predict conversion rate and the trend of next week conversion rate when model don’t combine previous conversion rate but only has a little help with previous conversion rate.
參考文獻 [1] Salton Gerard and Michael J. McGill. Introduction to Modern Information Retrieval, October 1986.
[2] Greg Corrado, Jeffrey Dean, Kai Chen and Tomas Mikolov. Efficient Estimation of Word Representations in Vector Space, September 2013.
[3] Dina Mayzlin and Judith A. Chevalier. The Effect of Word of Mouth on Sales:Online Book Reviews, August 2006.
[4] Eric Clemons, Guodong Gao and Lorin M. Hitt. When Online Reviews Meet Hyperdifferentiation : A Study of Craft Beer Industry, February 2006.
[5] Nan Hu, Ling Liu and Jie Zhang. Do Online Reviews Affect Product Sales? The Role of Reviewer Characteristics and Temporal Effects, September 2008.
[6] Yong Liu. Word of Mouth for Movies:Its Dynamics and Impact on Box Office Revenue, July 2006.
[7] Jerome H. Friedman. Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics Vol. 29 No.5, 2001.
[8] Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye and Tie-Yan Liu. LightGBM:A Highly Efficient Gradient Boosting Decision Tree, December 2017.
[9] Menno van Zaanen and Pieter Kanters. Automatic Mood Classification Using tf*idf Based on Lyrics. In J. Stephen Downie and Remco C. Veltkamp, 11th International Society for Music Information and Retrieval Conference, August 2010.
[10] Hsin-His Chen and Lun-Wei Ku. Mining opinions from the Web: Beyond relevance retrieval. Journal of the American Society for Information Science and Technology, 58(12), 1838-1850, August 2007.
描述 碩士
國立政治大學
統計學系
107354029
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0107354029
資料類型 thesis
dc.contributor.advisor 翁久幸zh_TW
dc.contributor.advisor Weng, Jiu-Xingen_US
dc.contributor.author (Authors) 許振楡zh_TW
dc.contributor.author (Authors) Hsu, Chen-Yuen_US
dc.creator (作者) 許振楡zh_TW
dc.creator (作者) Hsu, Chen-Yuen_US
dc.date (日期) 2020en_US
dc.date.accessioned 2-Sep-2020 11:43:53 (UTC+8)-
dc.date.available 2-Sep-2020 11:43:53 (UTC+8)-
dc.date.issued (上傳時間) 2-Sep-2020 11:43:53 (UTC+8)-
dc.identifier (Other Identifiers) G0107354029en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/131481-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計學系zh_TW
dc.description (描述) 107354029zh_TW
dc.description.abstract (摘要) 隨著電商平台的出現,顧客消費習慣逐漸受到改變,「線上評論」成為左右消費者購買意願的重要因素,參考過去學者 Chevalier 和 Mayzlin [3]對此議題的探討,以銷售排名作為反應變數,建立迴歸模型觀察評論分數、其他特徵的顯著程度,並無直接從評論文字萃取特徵,本論文建立在 Chevalier 和 Mayzlin [3]所提出的特徵,研究加上評論文字資訊能否更有效的預測顧客消費行為,評論文字資訊以 TFIDF、CBOW、Skip-gram 詞嵌入向量為特徵。
本文以某旅遊電商平台評論資料集為主,研究分成三部分,第一部分使用機器學習方法以文字特徵預測評論分數,預測分數與實際分數相關係數介於 0.2 到0.4 之間。第二部分以轉換率為預測目標,第三部分預測下期轉換率漲跌,分別比較加入文字特徵與僅以分數、其他評論特徵所建模型是否有更好的預測效果,實驗結果顯示,在此資料集上不包含前期轉換率時預測轉換率及下期漲跌,加入文字特徵皆有變好,若含前期轉換率時則僅有小幅的提升。
zh_TW
dc.description.abstract (摘要) With the showing of electronic commerce, consuming behavior has been changed.“Online Review”is an important factor that has big emphasis on customers’purchase intention. According to Chevalier and Mayzlin [3] s’research, they take sales number as response variable and build regression model to check the significance of score characteristics and other characteristics. However, they don’t consider the text review due to lack of natural language preprocessing methods. This research add review text information to see whether model has a better ability to predict customer behavior. We take two kinds of TFIDF、CBOW and Skip-gram as text characteristics.
Based on a traveling e-commerce review data, this research spit into three sections. In Section 1, predicting review score by using machine learning methods at first. In order to compare the difference between text characteristics and review score, we calculate the correlation of predicted score and original review score and get the result between 0.2 and 0.4. In section 2 and 3, our predict target is conversion rate and the trend of next week conversion rate, which go up, down or keep constant. We comparing model with text characteristics and without text characteristics to see whether text can bring useful information. Result shows that adding text characteristics truly can help predict conversion rate and the trend of next week conversion rate when model don’t combine previous conversion rate but only has a little help with previous conversion rate.
en_US
dc.description.tableofcontents 第一章 緒論 8
第二章 文獻回顧 10
第三章 研究方法 12
3.1 TFIDF 詞嵌入向量 12
3.1.1 TFIDF 12
3.1.2 文本訓練法一 13
3.1.3 文本訓練法二 13
3.2 Word2vec 詞嵌入向量 14
3.3 多則評論詞嵌入方法 17
3.4 分類模型 18
3.4.1 樸素貝氏分類器 18
3.4.2 LightGBM 19
第四章 資料介紹 21
4.1 評論資料集 22
4.2 商品轉換率 27
第五章 分析結果 28
5.1 評論文字與分數相關性 28
5.2 預測轉換 33
5.3 預測下期漲跌 43
第六章 研究結論與建議 49
參考文獻 51
附錄一 52
zh_TW
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0107354029en_US
dc.subject (關鍵詞) 轉換率預測zh_TW
dc.subject (關鍵詞) 文字評論zh_TW
dc.subject (關鍵詞) 機器學習zh_TW
dc.title (題名) 使用評論文字改善轉換率預測zh_TW
dc.title (題名) Improving Conversion Rate Prediction with Review Texten_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] Salton Gerard and Michael J. McGill. Introduction to Modern Information Retrieval, October 1986.
[2] Greg Corrado, Jeffrey Dean, Kai Chen and Tomas Mikolov. Efficient Estimation of Word Representations in Vector Space, September 2013.
[3] Dina Mayzlin and Judith A. Chevalier. The Effect of Word of Mouth on Sales:Online Book Reviews, August 2006.
[4] Eric Clemons, Guodong Gao and Lorin M. Hitt. When Online Reviews Meet Hyperdifferentiation : A Study of Craft Beer Industry, February 2006.
[5] Nan Hu, Ling Liu and Jie Zhang. Do Online Reviews Affect Product Sales? The Role of Reviewer Characteristics and Temporal Effects, September 2008.
[6] Yong Liu. Word of Mouth for Movies:Its Dynamics and Impact on Box Office Revenue, July 2006.
[7] Jerome H. Friedman. Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics Vol. 29 No.5, 2001.
[8] Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye and Tie-Yan Liu. LightGBM:A Highly Efficient Gradient Boosting Decision Tree, December 2017.
[9] Menno van Zaanen and Pieter Kanters. Automatic Mood Classification Using tf*idf Based on Lyrics. In J. Stephen Downie and Remco C. Veltkamp, 11th International Society for Music Information and Retrieval Conference, August 2010.
[10] Hsin-His Chen and Lun-Wei Ku. Mining opinions from the Web: Beyond relevance retrieval. Journal of the American Society for Information Science and Technology, 58(12), 1838-1850, August 2007.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202001226en_US