學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 統計分析與資料視覺化在電影利潤預測上之研究
Applications of Statistical Analysis and Data Visualization to MovieLens Data for Profit Prediction
作者 洪浚皓
Hung, Chun-Hao
貢獻者 張源俊
Chang, Yuan-Chin
洪浚皓
Hung, Chun-Hao
關鍵詞 MovieLens 資料集
機器學習
推薦系統
探索性資料分析
MovieLens Dataset
Machine Learning
Recommendation System
Exploratory Data Analysis
日期 2020
上傳時間 2-Sep-2020 11:42:50 (UTC+8)
摘要 隨著電影成為重要的娛樂文化,在今日,電影產業已經成長得
相當龐大以及難以預測。自從電影在1927年,聲音以及影像能被同
步,到小鹿斑比(1942)以動畫電影在第二次世界大戰期間取得巨大的
成功。在往後的70年間,隨著科技的進步以及拍攝手法的發展,電
影產業成長的速度極為快速,今日,一部電影需要經過極大的努力
以及許多的手續,才能被大眾觀賞。因此若我們能精準的預測一部
作品的利潤,則能更好的說服製片公司能投資龐大的金錢以製作出
好電影。在本篇論文,我們會透過資料探索以及資料視覺化探討電
影類別的趨勢,然後提出一個方法,在投入那些巨大努力之前,來
預測電影利潤。除利預測利潤這個主要目標之外,我們還會基於一
個部落格文章的想法做修改,提出一個建造推薦系統的方法。
Watching films or motion pictures is an important entertainment
culture such that the film industry becomes more complex and unpredictable
nowadays. After sucessfully syncroning sound and frames
of film in 1927[10], Bambi (1942) had a huge progress in making an
animation film during World War II. Since then, as the advancement
of technology and the development of filming techniques, the movie
industry has grown rapidly and vastly in the following 70 years. Now,
to play a piece of work to audiences, we have to go through a lot of
processes with all kinds of efforts. Thus, to have better prediction of
the possible profit of our work, then it may encourage the production
companies to invest in such movies. In this thesis, we discuss the
trend of genre and other information via exploration data, and data
visualization, and then propose a prediction method for the potential
profit of movies before investing more resources. Besides this main
goal – predicting movie profits, we also discuss how to have a novel
recommendation system via modifying the ideas of the blog post as
potential future studies.
參考文獻 [1] James Baglama and Lothar Reichel. “Augmented implicitly restarted
Lanczos bidiagonalization methods”. In: SIAM Journal on Scientific
Computing 27.1 (2005), pp. 19–42.
[2] Posts on Data Science Diarist. Building a Recommendation System
with Beer Data. https : / / www . r - bloggers . com / building - a -
recommendation-system-with-beer-data/. Accessed: 2020-05-20.
[3] Timothy A Davis and Yifan Hu. “The University of Florida sparse
matrix collection”. In: ACM Transactions on Mathematical Software
(TOMS) 38.1 (2011), pp. 1–25.
[4] IMDb. Året gjennom Børfjord (1991). https : / / www . imdb . com /
title/tt0103301/. Accessed: 2020-05-20.
[5] IMDb. Babylon 5. https : / / www . imdb . com / title / tt0105946/.
Accessed: 2020-05-20.
[6] IMDb. Bicicleta, cullera, poma (2010). https : / / www . imdb . com /
title/tt1710542/. Accessed: 2020-05-20.
[7] IMDb. Brazil: In the Shadow of the Stadiums. https://www.imdb.
com/title/tt3778744/. Accessed: 2020-05-20.
[8] IMDb. Cialo (original title). https://www.imdb.com/title/tt4358230/.
Accessed: 2020-05-20.
[9] IMDb. Das Millionenspiel (1970). https://www.imdb.com/title/
tt0066079/. Accessed: 2020-05-20.
[10] IMDb. Don Juan Trivia. https://www.imdb.com/title/tt0016804/
trivia. Accessed: 2020-06-16.
[11] IMDb. Im Schmerz geboren. https://www.imdb.com/title/tt3096440/.
Accessed: 2020-05-20.
[12] IMDb. In Our Garden (2002). https : / / www . imdb . com / title /
tt0495225/. Accessed: 2020-05-20.
[13] IMDb. Michael Laudrup - en fodboldspiller (1993). https : / / www .
imdb.com/title/tt0378357/. Accessed: 2020-05-20.
[14] IMDb. Moving Alan (2003). https://www.imdb.com/title/tt0310741/.
Accessed: 2020-05-20.
[15] IMDb. My Own Man (2014). https : / / www . imdb . com / title /
tt3356434/. Accessed: 2020-05-20.
[16] IMDb. National Theatre Live: Frankenstein (2011). https://www.
imdb.com/title/tt1795369/. Accessed: 2020-05-20.
[17] IMDb. P’tit Quinquin. https://www.imdb.com/title/tt3053694/.
Accessed: 2020-05-20.
[18] IMDb. Polskie gówno (2014). https : / / www . imdb . com / title /
tt4438688/. Accessed: 2020-05-20.
[19] IMDb. Slaying the Badger. https://www.imdb.com/title/tt3793686/.
Accessed: 2020-05-20.
[20] IMDb. Star Trek Beyond (original title). https://www.imdb.com/
title/tt2660888/. Accessed: 2020-05-20.
[21] IMDb. Star Trek IV: The Voyage Home (original title). https://www.
imdb.com/title/tt0092007/. Accessed: 2020-05-20.
[22] IMDb. Stephen Fry in America. https://www.imdb.com/title/
tt1307789/. Accessed: 2020-05-20.
[23] IMDb. The Court-Martial of Jackie Robinson (1990). https://www.
imdb.com/title/tt0099311/. Accessed: 2020-05-20.
[24] IMDb. The Dark Knight Trivia. https://www.imdb.com/title/
tt0468569/trivia. Accessed: 2020-06-30.
[25] IMDb. Third Reich: The Rise Fall. https://www.imdb.com/title/
tt1855924/. Accessed: 2020-05-20.
[26] IMDb. Two: The Story of Roman Nyro (2013). https://www.imdb.
com/title/tt2740874/. Accessed: 2020-05-20.
[27] Guolin Ke et al. “Lightgbm: A highly efficient gradient boosting decision
tree”. In: Advances in neural information processing systems.
2017, pp. 3146–3154.
[28] Sven Kosub. “A note on the triangle inequality for the Jaccard distance”.
In: Pattern Recognition Letters 120 (2019), pp. 36–38.
[29] MovieLens. Star Trek Beyond. https://movielens.org/movies/
135569. Accessed: 2020-05-20.
[30] MovieLens. Star Trek IV: The Voyage Home. https://movielens.
org/movies/1376. Accessed: 2020-05-20.
[31] Scott L Phillips. Beyond sound: the college and career guide in music
technology. Oxford University Press on Demand, 2013.
[32] Wikipedia. MovieLens. https://en.wikipedia.org/wiki/MovieLens.
Accessed: 2020-05-20.
描述 碩士
國立政治大學
統計學系
107354019
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0107354019
資料類型 thesis
dc.contributor.advisor 張源俊zh_TW
dc.contributor.advisor Chang, Yuan-Chinen_US
dc.contributor.author (Authors) 洪浚皓zh_TW
dc.contributor.author (Authors) Hung, Chun-Haoen_US
dc.creator (作者) 洪浚皓zh_TW
dc.creator (作者) Hung, Chun-Haoen_US
dc.date (日期) 2020en_US
dc.date.accessioned 2-Sep-2020 11:42:50 (UTC+8)-
dc.date.available 2-Sep-2020 11:42:50 (UTC+8)-
dc.date.issued (上傳時間) 2-Sep-2020 11:42:50 (UTC+8)-
dc.identifier (Other Identifiers) G0107354019en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/131476-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 統計學系zh_TW
dc.description (描述) 107354019zh_TW
dc.description.abstract (摘要) 隨著電影成為重要的娛樂文化,在今日,電影產業已經成長得
相當龐大以及難以預測。自從電影在1927年,聲音以及影像能被同
步,到小鹿斑比(1942)以動畫電影在第二次世界大戰期間取得巨大的
成功。在往後的70年間,隨著科技的進步以及拍攝手法的發展,電
影產業成長的速度極為快速,今日,一部電影需要經過極大的努力
以及許多的手續,才能被大眾觀賞。因此若我們能精準的預測一部
作品的利潤,則能更好的說服製片公司能投資龐大的金錢以製作出
好電影。在本篇論文,我們會透過資料探索以及資料視覺化探討電
影類別的趨勢,然後提出一個方法,在投入那些巨大努力之前,來
預測電影利潤。除利預測利潤這個主要目標之外,我們還會基於一
個部落格文章的想法做修改,提出一個建造推薦系統的方法。
zh_TW
dc.description.abstract (摘要) Watching films or motion pictures is an important entertainment
culture such that the film industry becomes more complex and unpredictable
nowadays. After sucessfully syncroning sound and frames
of film in 1927[10], Bambi (1942) had a huge progress in making an
animation film during World War II. Since then, as the advancement
of technology and the development of filming techniques, the movie
industry has grown rapidly and vastly in the following 70 years. Now,
to play a piece of work to audiences, we have to go through a lot of
processes with all kinds of efforts. Thus, to have better prediction of
the possible profit of our work, then it may encourage the production
companies to invest in such movies. In this thesis, we discuss the
trend of genre and other information via exploration data, and data
visualization, and then propose a prediction method for the potential
profit of movies before investing more resources. Besides this main
goal – predicting movie profits, we also discuss how to have a novel
recommendation system via modifying the ideas of the blog post as
potential future studies.
en_US
dc.description.tableofcontents 1 Introduction 4
2 Introduction of MovieLens dataset 6
2.1 MovieLens 20M Dataset 6
2.2 The Calibrated Data 7
3 EDA on Rating Data 10
4 Recommendation System 16
5 Trend of Genres 23
5.1 Genre Trend 24
5.2 Genre Similarity Matrix 26
6 Tag Analysis 28
7 Predict Movie Profits 31
7.1 Scraping Dataset 32
7.2 EDA and Data Cleaning 34
7.3 Building Model and Prediction 48
8 Conclusion and Future Studies 53
9 Reference 56
zh_TW
dc.format.extent 42852076 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0107354019en_US
dc.subject (關鍵詞) MovieLens 資料集zh_TW
dc.subject (關鍵詞) 機器學習zh_TW
dc.subject (關鍵詞) 推薦系統zh_TW
dc.subject (關鍵詞) 探索性資料分析zh_TW
dc.subject (關鍵詞) MovieLens Dataseten_US
dc.subject (關鍵詞) Machine Learningen_US
dc.subject (關鍵詞) Recommendation Systemen_US
dc.subject (關鍵詞) Exploratory Data Analysisen_US
dc.title (題名) 統計分析與資料視覺化在電影利潤預測上之研究zh_TW
dc.title (題名) Applications of Statistical Analysis and Data Visualization to MovieLens Data for Profit Predictionen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] James Baglama and Lothar Reichel. “Augmented implicitly restarted
Lanczos bidiagonalization methods”. In: SIAM Journal on Scientific
Computing 27.1 (2005), pp. 19–42.
[2] Posts on Data Science Diarist. Building a Recommendation System
with Beer Data. https : / / www . r - bloggers . com / building - a -
recommendation-system-with-beer-data/. Accessed: 2020-05-20.
[3] Timothy A Davis and Yifan Hu. “The University of Florida sparse
matrix collection”. In: ACM Transactions on Mathematical Software
(TOMS) 38.1 (2011), pp. 1–25.
[4] IMDb. Året gjennom Børfjord (1991). https : / / www . imdb . com /
title/tt0103301/. Accessed: 2020-05-20.
[5] IMDb. Babylon 5. https : / / www . imdb . com / title / tt0105946/.
Accessed: 2020-05-20.
[6] IMDb. Bicicleta, cullera, poma (2010). https : / / www . imdb . com /
title/tt1710542/. Accessed: 2020-05-20.
[7] IMDb. Brazil: In the Shadow of the Stadiums. https://www.imdb.
com/title/tt3778744/. Accessed: 2020-05-20.
[8] IMDb. Cialo (original title). https://www.imdb.com/title/tt4358230/.
Accessed: 2020-05-20.
[9] IMDb. Das Millionenspiel (1970). https://www.imdb.com/title/
tt0066079/. Accessed: 2020-05-20.
[10] IMDb. Don Juan Trivia. https://www.imdb.com/title/tt0016804/
trivia. Accessed: 2020-06-16.
[11] IMDb. Im Schmerz geboren. https://www.imdb.com/title/tt3096440/.
Accessed: 2020-05-20.
[12] IMDb. In Our Garden (2002). https : / / www . imdb . com / title /
tt0495225/. Accessed: 2020-05-20.
[13] IMDb. Michael Laudrup - en fodboldspiller (1993). https : / / www .
imdb.com/title/tt0378357/. Accessed: 2020-05-20.
[14] IMDb. Moving Alan (2003). https://www.imdb.com/title/tt0310741/.
Accessed: 2020-05-20.
[15] IMDb. My Own Man (2014). https : / / www . imdb . com / title /
tt3356434/. Accessed: 2020-05-20.
[16] IMDb. National Theatre Live: Frankenstein (2011). https://www.
imdb.com/title/tt1795369/. Accessed: 2020-05-20.
[17] IMDb. P’tit Quinquin. https://www.imdb.com/title/tt3053694/.
Accessed: 2020-05-20.
[18] IMDb. Polskie gówno (2014). https : / / www . imdb . com / title /
tt4438688/. Accessed: 2020-05-20.
[19] IMDb. Slaying the Badger. https://www.imdb.com/title/tt3793686/.
Accessed: 2020-05-20.
[20] IMDb. Star Trek Beyond (original title). https://www.imdb.com/
title/tt2660888/. Accessed: 2020-05-20.
[21] IMDb. Star Trek IV: The Voyage Home (original title). https://www.
imdb.com/title/tt0092007/. Accessed: 2020-05-20.
[22] IMDb. Stephen Fry in America. https://www.imdb.com/title/
tt1307789/. Accessed: 2020-05-20.
[23] IMDb. The Court-Martial of Jackie Robinson (1990). https://www.
imdb.com/title/tt0099311/. Accessed: 2020-05-20.
[24] IMDb. The Dark Knight Trivia. https://www.imdb.com/title/
tt0468569/trivia. Accessed: 2020-06-30.
[25] IMDb. Third Reich: The Rise Fall. https://www.imdb.com/title/
tt1855924/. Accessed: 2020-05-20.
[26] IMDb. Two: The Story of Roman Nyro (2013). https://www.imdb.
com/title/tt2740874/. Accessed: 2020-05-20.
[27] Guolin Ke et al. “Lightgbm: A highly efficient gradient boosting decision
tree”. In: Advances in neural information processing systems.
2017, pp. 3146–3154.
[28] Sven Kosub. “A note on the triangle inequality for the Jaccard distance”.
In: Pattern Recognition Letters 120 (2019), pp. 36–38.
[29] MovieLens. Star Trek Beyond. https://movielens.org/movies/
135569. Accessed: 2020-05-20.
[30] MovieLens. Star Trek IV: The Voyage Home. https://movielens.
org/movies/1376. Accessed: 2020-05-20.
[31] Scott L Phillips. Beyond sound: the college and career guide in music
technology. Oxford University Press on Demand, 2013.
[32] Wikipedia. MovieLens. https://en.wikipedia.org/wiki/MovieLens.
Accessed: 2020-05-20.
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202001674en_US