推薦系統的類別特徵工程基於熵驅動的優化

學術產出-學位論文

文章檢視/開啟

pdf(0)

書目匯出

Google Scholar^TM

題名	推薦系統的類別特徵工程基於熵驅動的優化 Entropy-driven Optimization of Recommendation Systems through Categorical Feature Engineering
作者	鄭竣鴻 Zheng, Jun-Hong
貢獻者	周珮婷<br>張育瑋 Chou, Pei-Ting<br>Chang, Yu-Wei 鄭竣鴻 Zheng, Jun-Hong
關鍵詞	類別變數特徵篩選條件熵推薦系統機器學習 Categorical variable Feature selection Conditional entropy Recommendation system Machine learning
日期	2024
上傳時間	1-七月-2024 13:27:41 (UTC+8)
摘要	特徵篩選在機器學習中扮演著關鍵角色，它有助於提高模型的準確性和效率，而條件熵是信息理論中一個用於評估特徵相關性的指標，它考慮了特徵之間的條件關係，有助於發現與目標變量密切相關的特徵。本研究旨在探討條件熵作為特徵篩選方法在大量類別型變數資料集的應用。以KKbox音樂資料集為例，利用條件熵在類別變數特徵篩選後的結果，評估篩選後的特徵集對模型性能的影響。我們的實驗結果顯示，我們能夠獲得一個具有較少特徵但仍具有良好性能的模型。這表明條件熵可以作為一種有效的特徵篩選方法，幫助我們發現與用戶聽歌行為密切相關的特徵，從而簡化大量資料集並提升模型的運算效率。 Feature selection plays a crucial role in machine learning as it helps enhance the accuracy and efficiency of models. Conditional entropy is an index from information theory used to evaluate the relevance of features, considering the conditional relationships between them. This helps in identifying features that are closely related to the target variable. This study aims to explore the application of conditional entropy as a feature selection method in datasets with a large number of categorical variables. Taking the KKbox music dataset as an example, we evaluate the impact on model performance by assessing the feature set selected through conditional entropy in categorical variable. Our experimental results show that we were able to obtain a model with fewer features but still maintaining good performance. This demonstrates that conditional entropy can serve as an effective feature selection method, helping us to discover features closely related to user listening behavior, thereby simplifying large datasets and enhancing the computational efficiency of the model.
參考文獻	Addison Howard, Arden Chiu, M. M. m. W. K. Y. (2017). Wsdm - kkbox’s music recommendation challenge. Chang, Y.-F. (2024). Entropy: A join between science and mind-society. change, 15:29. Darcy, R. and Aigner, H. (1980). The uses of entropy in the multivariate analysis of categorical variables. American Journal of Political Science, 24(1):155–174. Hill, W., Stead, L., Rosenstein, M., and Furnas, G. (1995). Recommending and evaluating choices in a virtual community of use. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 194–201. KBVresearch (2022). Global recommendation engine market size, share industry trends analysis report by type, by application, by deployment type, by organization size, by end use, by regional outlook, strategy, challenges and forecast, 2021 - 2027. https://www.kbvresearch. com/recommendation-engine-market/. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc. Klema, V. and Laub, A. (1980). The singular value decomposition: Its computation and some applications. IEEE Transactions on Automatic Control, 25(2):164–176. Kraskov, A., Stögbauer, H., and Grassberger, P. (2004). Estimating mutual information. Physical review E, 69(6):066138. Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J., and Liu, H. (2017). Feature selection: A data perspective. ACM Comput. Surv., 50(6). Li, Q., Kim, B. M., Guan, D. H., and Oh, D. w. (2004). A music recommender based on audio features. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 532–533. PyPI (2021). python package index - pypi. https://pypi.org/. Rosenberg, A. and Hirschberg, J. (2007). V-measure: A conditional entropy-based external cluster evaluation measure. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP- CoNLL), pages 410–420. Song, Y., Dixon, S., and Pearce, M. (2012). A survey of music recommendation systems and future perspectives. In 9th international symposium on computer music modeling and retrieval, volume 4, pages 395–410. Citeseer. Statista (2021). Volume of data/information created, captured, copied, and consumed world- wide from 2010 to 2020, with forecasts from 2021 to 2025. https://www.statista.com/ statistics/871513/worldwide-data-created/. Wold, S., Esbensen, K., and Geladi, P. (1987). Principal component analysis. Chemometrics and Intelligent Laboratory Systems, 2(1):37–52. Proceedings of the Multivariate Statistical Workshop for Geologists and Geochemists. Zhang, J. and Fogelman-Soulié, F. (2018). Kkbox’s music recommendation challenge solution with feature engineering. In 11th ACM International Conference on Web Search and Data Mining WSDM, pages 1–8.
描述	碩士國立政治大學統計學系 111354009
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0111354009
資料類型	thesis

dc.contributor.advisor	周珮婷<br>張育瑋	zh_TW
dc.contributor.advisor	Chou, Pei-Ting<br>Chang, Yu-Wei	en_US
dc.contributor.author (作者)	鄭竣鴻	zh_TW
dc.contributor.author (作者)	Zheng, Jun-Hong	en_US
dc.creator (作者)	鄭竣鴻	zh_TW
dc.creator (作者)	Zheng, Jun-Hong	en_US
dc.date (日期)	2024	en_US
dc.date.accessioned	1-七月-2024 13:27:41 (UTC+8)	-
dc.date.available	1-七月-2024 13:27:41 (UTC+8)	-
dc.date.issued (上傳時間)	1-七月-2024 13:27:41 (UTC+8)	-
dc.identifier (其他識別碼)	G0111354009	en_US
dc.identifier.uri (URI)	https://nccur.lib.nccu.edu.tw/handle/140.119/152130	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	統計學系	zh_TW
dc.description (描述)	111354009	zh_TW
dc.description.abstract (摘要)	特徵篩選在機器學習中扮演著關鍵角色，它有助於提高模型的準確性和效率，而條件熵是信息理論中一個用於評估特徵相關性的指標，它考慮了特徵之間的條件關係，有助於發現與目標變量密切相關的特徵。本研究旨在探討條件熵作為特徵篩選方法在大量類別型變數資料集的應用。以KKbox音樂資料集為例，利用條件熵在類別變數特徵篩選後的結果，評估篩選後的特徵集對模型性能的影響。我們的實驗結果顯示，我們能夠獲得一個具有較少特徵但仍具有良好性能的模型。這表明條件熵可以作為一種有效的特徵篩選方法，幫助我們發現與用戶聽歌行為密切相關的特徵，從而簡化大量資料集並提升模型的運算效率。	zh_TW
dc.description.abstract (摘要)	Feature selection plays a crucial role in machine learning as it helps enhance the accuracy and efficiency of models. Conditional entropy is an index from information theory used to evaluate the relevance of features, considering the conditional relationships between them. This helps in identifying features that are closely related to the target variable. This study aims to explore the application of conditional entropy as a feature selection method in datasets with a large number of categorical variables. Taking the KKbox music dataset as an example, we evaluate the impact on model performance by assessing the feature set selected through conditional entropy in categorical variable. Our experimental results show that we were able to obtain a model with fewer features but still maintaining good performance. This demonstrates that conditional entropy can serve as an effective feature selection method, helping us to discover features closely related to user listening behavior, thereby simplifying large datasets and enhancing the computational efficiency of the model.	en_US
dc.description.tableofcontents	第一章 Introduction 1 第二章 Literature Review 6 第一節 Feature Selection 6 第二節 ConditionalEntropy 7 第三節 Music Recommendation System 8 第三章 Methodology 10 第一節 Average of Conditional Entropy Interaction 10 第二節 Singular Value Decomposition 12 第三節 LightGBMModel 13 第四章 Empirical Analysis 16 第一節 Data Description and Preprocessing 16 第二節 Feature Engineering 20 第三節 Model Training and Evaluation Result 24 第五章 Conclusion and Future Improvement 28 第一節 Conclusion 28 第二節 Future Improvement 29 References 30	zh_TW
dc.format.extent	1034142 bytes	-
dc.format.mimetype	application/pdf	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0111354009	en_US
dc.subject (關鍵詞)	類別變數	zh_TW
dc.subject (關鍵詞)	特徵篩選	zh_TW
dc.subject (關鍵詞)	條件熵	zh_TW
dc.subject (關鍵詞)	推薦系統	zh_TW
dc.subject (關鍵詞)	機器學習	zh_TW
dc.subject (關鍵詞)	Categorical variable	en_US
dc.subject (關鍵詞)	Feature selection	en_US
dc.subject (關鍵詞)	Conditional entropy	en_US
dc.subject (關鍵詞)	Recommendation system	en_US
dc.subject (關鍵詞)	Machine learning	en_US
dc.title (題名)	推薦系統的類別特徵工程基於熵驅動的優化	zh_TW
dc.title (題名)	Entropy-driven Optimization of Recommendation Systems through Categorical Feature Engineering	en_US
dc.type (資料類型)	thesis	en_US
dc.relation.reference (參考文獻)	Addison Howard, Arden Chiu, M. M. m. W. K. Y. (2017). Wsdm - kkbox’s music recommendation challenge. Chang, Y.-F. (2024). Entropy: A join between science and mind-society. change, 15:29. Darcy, R. and Aigner, H. (1980). The uses of entropy in the multivariate analysis of categorical variables. American Journal of Political Science, 24(1):155–174. Hill, W., Stead, L., Rosenstein, M., and Furnas, G. (1995). Recommending and evaluating choices in a virtual community of use. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 194–201. KBVresearch (2022). Global recommendation engine market size, share industry trends analysis report by type, by application, by deployment type, by organization size, by end use, by regional outlook, strategy, challenges and forecast, 2021 - 2027. https://www.kbvresearch. com/recommendation-engine-market/. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc. Klema, V. and Laub, A. (1980). The singular value decomposition: Its computation and some applications. IEEE Transactions on Automatic Control, 25(2):164–176. Kraskov, A., Stögbauer, H., and Grassberger, P. (2004). Estimating mutual information. Physical review E, 69(6):066138. Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J., and Liu, H. (2017). Feature selection: A data perspective. ACM Comput. Surv., 50(6). Li, Q., Kim, B. M., Guan, D. H., and Oh, D. w. (2004). A music recommender based on audio features. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 532–533. PyPI (2021). python package index - pypi. https://pypi.org/. Rosenberg, A. and Hirschberg, J. (2007). V-measure: A conditional entropy-based external cluster evaluation measure. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP- CoNLL), pages 410–420. Song, Y., Dixon, S., and Pearce, M. (2012). A survey of music recommendation systems and future perspectives. In 9th international symposium on computer music modeling and retrieval, volume 4, pages 395–410. Citeseer. Statista (2021). Volume of data/information created, captured, copied, and consumed world- wide from 2010 to 2020, with forecasts from 2021 to 2025. https://www.statista.com/ statistics/871513/worldwide-data-created/. Wold, S., Esbensen, K., and Geladi, P. (1987). Principal component analysis. Chemometrics and Intelligent Laboratory Systems, 2(1):37–52. Proceedings of the Multivariate Statistical Workshop for Geologists and Geochemists. Zhang, J. and Fogelman-Soulié, F. (2018). Kkbox’s music recommendation challenge solution with feature engineering. In 11th ACM International Conference on Web Search and Data Mining WSDM, pages 1–8.	zh_TW

學術產出-學位論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

Google Scholar^TM