學術產出-Theses

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 以資料探勘技術預測電影配樂的使用時機
Timing Prediction of Movie Scoring Based on Data Mining Techniques
作者 段承甫
Duan, Cheng-Fu
貢獻者 沈錳坤
Shan, Man-Kwan
段承甫
Duan, Cheng-Fu
關鍵詞 電影配樂
時機預測
電影分段
資料探勘
Film score
Timing prediction
Movie segmentation
Data mining
日期 2018
上傳時間 3-Sep-2018 15:52:01 (UTC+8)
摘要 好的電影配樂是一部傑出電影不可或缺的一部分。音樂家根據電影的類型、風格,在對的時機點,為電影量身打造出適合的配樂。過去已有許多與影片內容分析相關的研究,但尚未有預測電影配樂使用時機的研究。本研究以配樂表現傑出的電影為樣本資料,利用資料探勘技術學習出配樂時機的預測模型,以模型自動為尚未配樂的電影找出適合配樂的電影片段。此研究能延伸於User-Generated Video的背景音樂時機預測。
本研究將電影配樂使用時機的預測問題轉換成二元分類問題。為了使電影片段對於配樂的使用具有代表性,我們以場景為單位將電影分段,我們利用電影劇本與電影字幕對齊以及電影鏡頭的資訊將電影分段。電影分段後我們抓取每一個片段的視覺特徵、文字特徵、電影Metadata與其它特徵,以此些特徵訓練預測模型。我們於實驗中以Decision Tree、Logistic Regression、Support Vector Machine、Random Forest與Conditional Random Field進行實驗,從中觀察影響配樂使用時機的關鍵因素與不同電影之間的預測結果,並加上考慮場景情境的條件下是否能提升預測的效果。從實驗結果發現,影響配樂使用時機的主要因素為片段於電影中的時間點、台詞的時間比例與台詞密度。加上考慮場景的情境能提升大部分電影的預測效果,而使用Random Forest作為預測模型的演算法效果最佳(R-Precision約0.663,Area under the Curve of ROC約0.675)。
Film score is essential to movies. Composers compose background scores for movies according to movie styles and genres. Much research has been done on video content analysis, but none has been done on timing prediction of movie score. In this thesis, we investigate the timing prediction of film score based on data mining techniques. It is helpful for timing prediction of background music for user generated content.
In the proposed approach, the timing prediction problem is transformed as a binary classification problem. We first segment movies into scenes by alignment between scripts and subtitles of movies. After movie segmentation, visual features, text features, movie metadata and sentiment features of each scene are extracted and are used to learn the prediction model. In the experiments, Decision Tree, Logistic Regression, Support Vector Machine, Random Forest and Conditional Random Field algorithms are employed for model training. The result of experiments show that timestamp, proportion of subtitles and word density of scenes are key factors of timing prediction and taking context into consideration can improve prediction performance.
參考文獻 [1] B. T. Truong, and S. Venkatesh, Video Abstraction: A Systematic Review and Classification. ACM Transactions on Multimedia Computing, Communications and Applications, 3(1), 2007.
[2] Y. Li, T. Zhang, and D. Tretter, An Overview of Video Abstraction Techniques. Technical Report HPL-2001-191, HP Laboratory, 2001.
[3] H. W. Chen, J. H. Kuo, W. T. Chu, and J. L. Wu, Action Movies Segmentation and Summarization Based on Tempo Analysis. Proceedings of the 6th ACM SIGMM International Workshop on Multimedia Information Retrieval, 2004.
[4] B. Adams, C. Dorai, and S. Venkatesh, Toward Automatic Extraction of Expressive Elements from Motion Pictures: Tempo. IEEE Transactions on Multimedia, 4(4), 2002.
[5] T. Hermes and C. Schultz, Automatic Generation of Hollywood-Like Movie Trailers. cat1.netzspannung.org, 2006.
[6] T. v. Wenzlawowicz and O. Herzog, Semantic Video Abstracting: Automatic Generation of Movie Trailers Based on Video Patterns. SETN 2012: Artificial Intelligence: Theories and Applications, 2012.
[7] B. Ionescu, P. Lambert, D. Coquin, L. Ott and V. Buzuloiu, Animation Movies Trailer Computation. Proceedings of the 14th ACM International Conference on Multimedia, 2006.
[8] B. Ionescu, V. Buzuloiu, P. Lambert and D. Coquin, Improved Cut Detection for the Segmentation of Animation Movies. IEEE International Conference on Acoustic, Speech and Signal Processing, 2006.
[9] G. Irie, T. Satou, A. Kojima, T. Yamasaki, and K. Aizawa, Automatic Trailer Generation. Proceedings of the 18th ACM International Conference on Multimedia, 2010.
[10] B.J. Frey and D. Deuck, Clustering by Passing Messages between Data Points. Science, 315(5814), 2007.
[11] F. Smeaton, B. Lehane, N. E. O’Connor, C. Brady and G. Craig, Automatically Selecting Shots for Action Movie Trailers. Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, 2006.
[12] Z. Xu, and Y. Zhang, Automatic Generated Recommendation for Movie Trailers. IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, 2013.
[13] X. Liu, and J. Jiang, Semi-supervised Learning towards Computerized Generation of Movie Trailers. IEEE International Conference on Systems, Man, and Cybernetics, 2015.
[14] Y. F. Li, and Z. H. Zhou, Towards Making Unlabeled Data Never Hurt. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1), 2015.
[15] H. Xu, Y. Zhen, and H. Zha, Trailer Generation via a Point Process-Based Visual Attractiveness Model. Proceedings of the 24th International Joint Conference on Artificial Intelligence, 2015.
[16] J. R. Smith, D. Joshi, B. Huet, W. Hsu, and J. Cota, Harnessing A.I. for Augmenting Creativity: Application to Movie Trailer Creation. Proceedings of the 25th ACM International Conference on Multimedia, 2017.
[17] B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, Learning Deep Features for Scene Recognition using Places Database. Advances in Neural Information Processing Systems, 27, 2014.
[18] K. S. Lin, A. Lee, Y. H. Yang, C. T. Lee, and H. H. Chen, Automatic Highlights Extraction for Drama Video Using Music Emotion and Human Face Features. Neurocomputing, 119(Intelligent Processing Techniques for Semantic-based Image and Video Retrieval), 2013.
[19] M. Xu, S. Luo, J. S. Jin, and M. Park, Affective Content Analysis by Mid-Level Representation in Multiple Modalities. Proceedings of the First International Conference on Internet Multimedia Computing and Service, 2009.
[20] S. B. Park, H. N. Kim, H. Kim, and G. S. Jo, Exploiting Script-Subtitles Alignment to Scene Boundary Detection in Movie. IEEE International Symposium on Multimedia, 2010.
[21] T. Giannakopoulos, pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis. PLOS-One, 10(12), 2015.
[22] T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient Estimation of Word Representations in Vector Space. International Conference on Learning Representations, 2013.
[23] C.J. Hutto, and Eric Gilbert, VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. 8th International Conference on Weblogs and Social Media, 2014.
[24] J. Lafferty, A. McCallum, and F. C.N. Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proceedings of the 18th International Conference on Machine Learning, 2001.
[25] A. B. Warriner, V. Kuperman, and M. Brysbaert, Norms of Valence, Arousal, and Dominance for 13,915 English Lemmas. Behavior research methods, 2013.
描述 碩士
國立政治大學
資訊科學系
104753017
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0104753017
資料類型 thesis
dc.contributor.advisor 沈錳坤zh_TW
dc.contributor.advisor Shan, Man-Kwanen_US
dc.contributor.author (Authors) 段承甫zh_TW
dc.contributor.author (Authors) Duan, Cheng-Fuen_US
dc.creator (作者) 段承甫zh_TW
dc.creator (作者) Duan, Cheng-Fuen_US
dc.date (日期) 2018en_US
dc.date.accessioned 3-Sep-2018 15:52:01 (UTC+8)-
dc.date.available 3-Sep-2018 15:52:01 (UTC+8)-
dc.date.issued (上傳時間) 3-Sep-2018 15:52:01 (UTC+8)-
dc.identifier (Other Identifiers) G0104753017en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/119909-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系zh_TW
dc.description (描述) 104753017zh_TW
dc.description.abstract (摘要) 好的電影配樂是一部傑出電影不可或缺的一部分。音樂家根據電影的類型、風格,在對的時機點,為電影量身打造出適合的配樂。過去已有許多與影片內容分析相關的研究,但尚未有預測電影配樂使用時機的研究。本研究以配樂表現傑出的電影為樣本資料,利用資料探勘技術學習出配樂時機的預測模型,以模型自動為尚未配樂的電影找出適合配樂的電影片段。此研究能延伸於User-Generated Video的背景音樂時機預測。
本研究將電影配樂使用時機的預測問題轉換成二元分類問題。為了使電影片段對於配樂的使用具有代表性,我們以場景為單位將電影分段,我們利用電影劇本與電影字幕對齊以及電影鏡頭的資訊將電影分段。電影分段後我們抓取每一個片段的視覺特徵、文字特徵、電影Metadata與其它特徵,以此些特徵訓練預測模型。我們於實驗中以Decision Tree、Logistic Regression、Support Vector Machine、Random Forest與Conditional Random Field進行實驗,從中觀察影響配樂使用時機的關鍵因素與不同電影之間的預測結果,並加上考慮場景情境的條件下是否能提升預測的效果。從實驗結果發現,影響配樂使用時機的主要因素為片段於電影中的時間點、台詞的時間比例與台詞密度。加上考慮場景的情境能提升大部分電影的預測效果,而使用Random Forest作為預測模型的演算法效果最佳(R-Precision約0.663,Area under the Curve of ROC約0.675)。
zh_TW
dc.description.abstract (摘要) Film score is essential to movies. Composers compose background scores for movies according to movie styles and genres. Much research has been done on video content analysis, but none has been done on timing prediction of movie score. In this thesis, we investigate the timing prediction of film score based on data mining techniques. It is helpful for timing prediction of background music for user generated content.
In the proposed approach, the timing prediction problem is transformed as a binary classification problem. We first segment movies into scenes by alignment between scripts and subtitles of movies. After movie segmentation, visual features, text features, movie metadata and sentiment features of each scene are extracted and are used to learn the prediction model. In the experiments, Decision Tree, Logistic Regression, Support Vector Machine, Random Forest and Conditional Random Field algorithms are employed for model training. The result of experiments show that timestamp, proportion of subtitles and word density of scenes are key factors of timing prediction and taking context into consideration can improve prediction performance.
en_US
dc.description.tableofcontents 第一章 緒論 1
第二章 相關研究 4
2.1影片概要 4
2.2電影預告片 4
2.2.1 未使用訓練模型挑選預告片素材 5
2.2.2 使用訓練模型挑選預告片素材 6
2.3影片精華 8
第三章 研究方法 10
3.1 系統架構 (System Architecture) 10
3.2電影分段 (Movie Segmentation) 11
3.2.1電影鏡頭邊界偵測 (Shot Boundary Detection) 12
3.2.2 電影劇本與電影字幕對齊 (Script-Subtitle Alignment) 12
3.2.3 鏡頭合併 (Shot Merge) 16
3.3 場景標註 (Ground Truth) 18
3.4特徵值抓取 (Feature Extraction) 19
3.4.1視覺特徵 (Visual Feature) 19
3.4.2 文字特徵 (Text Feature) 21
3.4.3 電影詮釋資料 (Metadata) 24
3.4.4 其它特徵 (Other Feature) 25
3.5預測模型 (Prediction Model) 27
3.5.1 Support Vector Machine (SVM) 27
3.5.2 Logistic Regression 28
3.5.3 Decision Trees 28
3.5.4 Random Forest 29
3.5.5 Conditional Random Fields (CRF) 29
第四章 實驗方法 31
4.1 實驗資料 31
4.2 實驗利用之工具 33
4.2.1 電影分段 33
4.2.2 特徵值抓取 33
4.2.3 預測模型 34
4.3預測適合配樂的場景 34
4.3.1 實驗評估 34
4.3.2 CRF前後場景個數比較 34
4.3.3 場景預測結果 35
4.3.4 考慮場景情境 38
第五章 結論與未來研究 43
參考文獻 44
zh_TW
dc.format.extent 2783825 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0104753017en_US
dc.subject (關鍵詞) 電影配樂zh_TW
dc.subject (關鍵詞) 時機預測zh_TW
dc.subject (關鍵詞) 電影分段zh_TW
dc.subject (關鍵詞) 資料探勘zh_TW
dc.subject (關鍵詞) Film scoreen_US
dc.subject (關鍵詞) Timing predictionen_US
dc.subject (關鍵詞) Movie segmentationen_US
dc.subject (關鍵詞) Data miningen_US
dc.title (題名) 以資料探勘技術預測電影配樂的使用時機zh_TW
dc.title (題名) Timing Prediction of Movie Scoring Based on Data Mining Techniquesen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] B. T. Truong, and S. Venkatesh, Video Abstraction: A Systematic Review and Classification. ACM Transactions on Multimedia Computing, Communications and Applications, 3(1), 2007.
[2] Y. Li, T. Zhang, and D. Tretter, An Overview of Video Abstraction Techniques. Technical Report HPL-2001-191, HP Laboratory, 2001.
[3] H. W. Chen, J. H. Kuo, W. T. Chu, and J. L. Wu, Action Movies Segmentation and Summarization Based on Tempo Analysis. Proceedings of the 6th ACM SIGMM International Workshop on Multimedia Information Retrieval, 2004.
[4] B. Adams, C. Dorai, and S. Venkatesh, Toward Automatic Extraction of Expressive Elements from Motion Pictures: Tempo. IEEE Transactions on Multimedia, 4(4), 2002.
[5] T. Hermes and C. Schultz, Automatic Generation of Hollywood-Like Movie Trailers. cat1.netzspannung.org, 2006.
[6] T. v. Wenzlawowicz and O. Herzog, Semantic Video Abstracting: Automatic Generation of Movie Trailers Based on Video Patterns. SETN 2012: Artificial Intelligence: Theories and Applications, 2012.
[7] B. Ionescu, P. Lambert, D. Coquin, L. Ott and V. Buzuloiu, Animation Movies Trailer Computation. Proceedings of the 14th ACM International Conference on Multimedia, 2006.
[8] B. Ionescu, V. Buzuloiu, P. Lambert and D. Coquin, Improved Cut Detection for the Segmentation of Animation Movies. IEEE International Conference on Acoustic, Speech and Signal Processing, 2006.
[9] G. Irie, T. Satou, A. Kojima, T. Yamasaki, and K. Aizawa, Automatic Trailer Generation. Proceedings of the 18th ACM International Conference on Multimedia, 2010.
[10] B.J. Frey and D. Deuck, Clustering by Passing Messages between Data Points. Science, 315(5814), 2007.
[11] F. Smeaton, B. Lehane, N. E. O’Connor, C. Brady and G. Craig, Automatically Selecting Shots for Action Movie Trailers. Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, 2006.
[12] Z. Xu, and Y. Zhang, Automatic Generated Recommendation for Movie Trailers. IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, 2013.
[13] X. Liu, and J. Jiang, Semi-supervised Learning towards Computerized Generation of Movie Trailers. IEEE International Conference on Systems, Man, and Cybernetics, 2015.
[14] Y. F. Li, and Z. H. Zhou, Towards Making Unlabeled Data Never Hurt. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1), 2015.
[15] H. Xu, Y. Zhen, and H. Zha, Trailer Generation via a Point Process-Based Visual Attractiveness Model. Proceedings of the 24th International Joint Conference on Artificial Intelligence, 2015.
[16] J. R. Smith, D. Joshi, B. Huet, W. Hsu, and J. Cota, Harnessing A.I. for Augmenting Creativity: Application to Movie Trailer Creation. Proceedings of the 25th ACM International Conference on Multimedia, 2017.
[17] B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, Learning Deep Features for Scene Recognition using Places Database. Advances in Neural Information Processing Systems, 27, 2014.
[18] K. S. Lin, A. Lee, Y. H. Yang, C. T. Lee, and H. H. Chen, Automatic Highlights Extraction for Drama Video Using Music Emotion and Human Face Features. Neurocomputing, 119(Intelligent Processing Techniques for Semantic-based Image and Video Retrieval), 2013.
[19] M. Xu, S. Luo, J. S. Jin, and M. Park, Affective Content Analysis by Mid-Level Representation in Multiple Modalities. Proceedings of the First International Conference on Internet Multimedia Computing and Service, 2009.
[20] S. B. Park, H. N. Kim, H. Kim, and G. S. Jo, Exploiting Script-Subtitles Alignment to Scene Boundary Detection in Movie. IEEE International Symposium on Multimedia, 2010.
[21] T. Giannakopoulos, pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis. PLOS-One, 10(12), 2015.
[22] T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient Estimation of Word Representations in Vector Space. International Conference on Learning Representations, 2013.
[23] C.J. Hutto, and Eric Gilbert, VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. 8th International Conference on Weblogs and Social Media, 2014.
[24] J. Lafferty, A. McCallum, and F. C.N. Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proceedings of the 18th International Conference on Machine Learning, 2001.
[25] A. B. Warriner, V. Kuperman, and M. Brysbaert, Norms of Valence, Arousal, and Dominance for 13,915 English Lemmas. Behavior research methods, 2013.
zh_TW
dc.identifier.doi (DOI) 10.6814/THE.NCCU.CS.012.2018.B02-