學術產出-學位論文

文章檢視/開啟

書目匯出

Google ScholarTM

政大圖書館

引文資訊

TAIR相關學術產出

題名 災難事件下新媒體資訊傳播方式分析與自動化分類設計 ─ 以八八風災為例
Information Transmission Analysis and Automated Classification Design for New Media in a Disaster Event – Case Study of Typhoon Morakot
作者 施旭峰
Shih, Shiuh Feng
貢獻者 李蔡彥
Li, Tsai Yen
施旭峰
Shih, Shiuh Feng
關鍵詞 自動化分類
八八風災
災難事件
新媒體
Automated Classification
Typhoon Morakot
Disaster Event
New Media
日期 2013
上傳時間 1-十一月-2013 11:43:41 (UTC+8)
摘要 災難事件發生時,災難資訊的分析和傳遞需具有即時性,才能讓資訊運用達到防災與救災的目的。網路基礎設施普及後,災難資訊的提供者加入廣大的網路公眾媒體,單獨透過搜尋引擎檢索無法即時的反應災難目前狀態;而像災難應變中心這類傳統頻道的災難通報管道有限,經常無法負荷突然爆發的資訊。這些因災難爆發的瞬間巨量資料,已無法完全使用人力蒐集、過濾與處理,需要發展新的工具能夠快速的自動化分類新媒體頻道資訊,提供救災防災體系應變或政府決策時參考。
本研究收集莫拉克颱風八八水災期間五個頻道資料,經過文字處理與專家分類後,由頻率分布、分類結構組成與詞彙共現網絡,觀察不同頻道資料集之性質的異同。在未考慮詞性與文法的狀況下,使用向量空間模型訓練OAO-SVM分類器模型,評估自動化分類方式的績效。
根據分析結果我們發現災難發生後,網路上的資訊隨著時序存在著階段性的期程,能夠由各個頻道瞭解災難的進程。透過詞彙共現網絡,瞭解救難專家書寫相較於俗民書寫使用的詞彙少重複且異質性較高。使用OAO-SVM訓練分類器結果,救難專家書寫的頻道分類績效優於俗民書寫。分類器交叉比較後,對於同性質頻道的內容具有較好的分類績效。透過合併相同屬性資料集訓練,我們發現當訓練資料的品質夠好時,分類器能夠有不錯的分類績效。品質不夠時,可以經由增加訓練資料的數量來提升分類的績效。本研究的歸納,以及所發展出來的分類方式與資訊探索技術,未來可以用於開發更有效率且精確的社群感知器。
When disaster events occur, information diffusion and transmission need to be in real-time in order to exploit the information in disaster prevention and recovery. With the establishment of network infrastructure, mass media also joins the role of information providers of disaster events on the internet. However retrieved information through search engines often cannot reflect the status of a progressing disaster. Traditional channels such as disaster reaction centers also have difficulty handling the inpour of disaster information, and which is usually beyond the ability of human processing. Thus there is a need to develop new tools to quickly automate classification of information from new media, to provide reliable information to disaster reaction centers, and assist policy decision-making.
In this study, we use the data during typhoon Morakot collected from five different channels. After word processing and content classification by experts, we observe the difference between these datasets by the frequency distribution, classification structures and word co-occurrence network. We use the vector space model to train the OAO-SVM classification model without considering speech and grammar, and evaluate the performance of automated classification.
From the results, we found that the chronology of internet data can identify a number of stages throughout the progression of disasters, allowing us to oversee the development of the disaster through each channel. Through word relation in word co-occurrence network, experts use fewer repeating words and high heterogeneity than amateur writing channels. The training results of classifier from the OAO-SVM model indicate that channels maintained by experts perform better than amateur writing. The cross compare classifier has better performance for channels with the same properties. When we merge the same property channel dataset to train classifier, we found that when the training data quality is good enough, the classifier can have a good performance. If the data quality is not enough, you can increase amount of training data to improve classification performance. As a contribution of this research, we believe the techniques developed and results of the analysis can be used to design more efficient and accurate social sensors in the future.
參考文獻 S. Bowman and C. Willis. (2003). We media: How audiences are shaping the future of news and information. Available: http://www.hypergene.net/wemedia/download/we_media.pdf
陳百齡 and 鄭宇君, "災難情境下的新興媒體:莫拉克風災中的浮現頻道," presented at the 中華傳播學會2011年會, 新竹,交通大學, 2011.
M. Morris and C. Ogan, "The Internet as mass medium," Journal of Communication, vol. 46, pp. 39-50, Win 1996.
孫式文, "網際網路在災難事件中的傳播功能:理論與實務的辯證," 新聞學研究, p. 25, 2002.04.
T. O`Reilly. (2005, 07.18). What Is Web 2.0. Available: http://oreilly.com/web2/archive/what-is-web-20.html
楊千慧. (1999, 07-25). 他山之石:日本阪神地震網路應用經驗分享. Available: http://www.find.org.tw/find/home.aspx?page=news&id=344
張士弘, "災害應變中心整體系統維運淺論," 國土資訊系統通訊, vol. 44, 2005.
E. L. Quarantelli. (1998). The Computer Based Information/Communication Revolution: A Dozen Problematical Issues And Questions They Raise For Disaster Planning And Managing. Available: http://udspace.udel.edu/handle/19716/659
J. Sutton, L. Palen, and I. Shklovski, "Backchannels on the Front Lines: Emergent Uses of Social Media in the 2007 Southern California Wildfires," in Proceedings of the 5th International ISCRAM Conference, Washington, DC, USA, 2008.
顧佳欣. (2009, 07.28). 莫拉克效應:災難傳播要善用資源. Available: http://www.feja.org.tw/modules/news007/article.php?storyid=395
孫式文, "網際網路在社會危機中的功能:網友調查研究," presented at the 2000網路與社會研討會, 新竹, 2000.
E. M. Rogers, Diffusion of Innovations. New York: The Free Press 1995.
L. Potts, "Peering into disaster: Social software use from the Indian Ocean earthquake to the Mumbai bombings," In Proceedings of the International Professional Communication Conference, Hawaii, 2009.
G. V. Cormack, J. M. G. Hidalgo, and E. P. Sanz, "Spam filtering for short messages," in Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, Lisbon, Portugal, 2007, pp. 313-320.
R. Gupta and L. Ratinov, "Text Categorization with Knowledge Transfer from Heterogeneous Data Sources," in Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, Chicago, 2008.
C. H. Brooks and N. Montanez, "Improved Annotation of the Blogopshere via Autotagging and Hierarchical Clustering," presented at the WWW2006 Conference, Edinburgh, UK, 2006.
R. Munro and C. D. Manning, "Subword Variation in Text Message Classification," presented at the Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, Los Angeles, California, 2010.
C. Caragea, N. McNeese, A. Jaiswal, G. Traylor, H.-W. Kim, P. Mitra, D. Wu, A. H. Tapia, L. Giles, B. J. Jansen, and J. Yen, "Classifying Text Messages for the Haiti Earthquake," in Proceedings of the 8th International ISCRAM Conference, Lisbon, Portugal, 2011.
香港警務處. (2008). 發展第三代重大事件調查及災難支援工作系統. Available: http://www.legco.gov.hk/yr07-08/chinese/fc/fc/papers/f08-23c.pdf
G. Salton and C. Buckley, "Term-Weighting Approaches in Automatic Text Retrieval," Information Processing & Management, vol. 24, pp. 513-523, 1988.
G. Salton, A. Wong, and C. S. Yang, "A vector space model for automatic indexing," Communications of the ACM, vol. 18, pp. 613-620, 1975.
Y. Yang and J. O. Pedersen, "A comparative study on feature selection in text categorization," in Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 412–420.
K.-B. Duan and S. S. Keerthi, "Which is the best multiclass SVM method? an empirical study," in Proceedings of the 6th international conference on Multiple Classifier Systems, Seaside, CA, 2005, pp. 278-285.
C.-H. Tsai. (2000). MMSEG: A Word Identification System for Mandarin Chinese Text Based on Two Variants of the Maximum Matching Algorithm. Available: http://technology.chtsai.org/mmseg/
國語辭典簡編本編輯小組. (1997). 國語辭典簡編本編輯資料字詞頻統計報告. Available: http://www.edu.tw/files/site_content/M0001/pin/f11.html
中華郵政有限公司. (2012, 06.10). 3+2碼郵遞區號Excel檔 101/05. Available: http://www.post.gov.tw/post/internet/down/index.html
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and É. Duchesnay, "Scikit-learn: Machine Learning in Python," JMLR, vol. 12, pp. 2825-2830, 2011.
描述 碩士
國立政治大學
資訊科學學系
99753014
102
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0099753014
資料類型 thesis
dc.contributor.advisor 李蔡彥zh_TW
dc.contributor.advisor Li, Tsai Yenen_US
dc.contributor.author (作者) 施旭峰zh_TW
dc.contributor.author (作者) Shih, Shiuh Fengen_US
dc.creator (作者) 施旭峰zh_TW
dc.creator (作者) Shih, Shiuh Fengen_US
dc.date (日期) 2013en_US
dc.date.accessioned 1-十一月-2013 11:43:41 (UTC+8)-
dc.date.available 1-十一月-2013 11:43:41 (UTC+8)-
dc.date.issued (上傳時間) 1-十一月-2013 11:43:41 (UTC+8)-
dc.identifier (其他 識別碼) G0099753014en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/61489-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學學系zh_TW
dc.description (描述) 99753014zh_TW
dc.description (描述) 102zh_TW
dc.description.abstract (摘要) 災難事件發生時,災難資訊的分析和傳遞需具有即時性,才能讓資訊運用達到防災與救災的目的。網路基礎設施普及後,災難資訊的提供者加入廣大的網路公眾媒體,單獨透過搜尋引擎檢索無法即時的反應災難目前狀態;而像災難應變中心這類傳統頻道的災難通報管道有限,經常無法負荷突然爆發的資訊。這些因災難爆發的瞬間巨量資料,已無法完全使用人力蒐集、過濾與處理,需要發展新的工具能夠快速的自動化分類新媒體頻道資訊,提供救災防災體系應變或政府決策時參考。
本研究收集莫拉克颱風八八水災期間五個頻道資料,經過文字處理與專家分類後,由頻率分布、分類結構組成與詞彙共現網絡,觀察不同頻道資料集之性質的異同。在未考慮詞性與文法的狀況下,使用向量空間模型訓練OAO-SVM分類器模型,評估自動化分類方式的績效。
根據分析結果我們發現災難發生後,網路上的資訊隨著時序存在著階段性的期程,能夠由各個頻道瞭解災難的進程。透過詞彙共現網絡,瞭解救難專家書寫相較於俗民書寫使用的詞彙少重複且異質性較高。使用OAO-SVM訓練分類器結果,救難專家書寫的頻道分類績效優於俗民書寫。分類器交叉比較後,對於同性質頻道的內容具有較好的分類績效。透過合併相同屬性資料集訓練,我們發現當訓練資料的品質夠好時,分類器能夠有不錯的分類績效。品質不夠時,可以經由增加訓練資料的數量來提升分類的績效。本研究的歸納,以及所發展出來的分類方式與資訊探索技術,未來可以用於開發更有效率且精確的社群感知器。
zh_TW
dc.description.abstract (摘要) When disaster events occur, information diffusion and transmission need to be in real-time in order to exploit the information in disaster prevention and recovery. With the establishment of network infrastructure, mass media also joins the role of information providers of disaster events on the internet. However retrieved information through search engines often cannot reflect the status of a progressing disaster. Traditional channels such as disaster reaction centers also have difficulty handling the inpour of disaster information, and which is usually beyond the ability of human processing. Thus there is a need to develop new tools to quickly automate classification of information from new media, to provide reliable information to disaster reaction centers, and assist policy decision-making.
In this study, we use the data during typhoon Morakot collected from five different channels. After word processing and content classification by experts, we observe the difference between these datasets by the frequency distribution, classification structures and word co-occurrence network. We use the vector space model to train the OAO-SVM classification model without considering speech and grammar, and evaluate the performance of automated classification.
From the results, we found that the chronology of internet data can identify a number of stages throughout the progression of disasters, allowing us to oversee the development of the disaster through each channel. Through word relation in word co-occurrence network, experts use fewer repeating words and high heterogeneity than amateur writing channels. The training results of classifier from the OAO-SVM model indicate that channels maintained by experts perform better than amateur writing. The cross compare classifier has better performance for channels with the same properties. When we merge the same property channel dataset to train classifier, we found that when the training data quality is good enough, the classifier can have a good performance. If the data quality is not enough, you can increase amount of training data to improve classification performance. As a contribution of this research, we believe the techniques developed and results of the analysis can be used to design more efficient and accurate social sensors in the future.
en_US
dc.description.tableofcontents 第一章 導論 5
1.1. 研究動機 5
1.2. 問題描述 9
1.3. 研究目的 10
1.4. 預期貢獻 10
第二章 相關研究 11
2.1. 災難期間的傳播活動 11
2.2. 災難發生後資訊傳達問題(瞬間巨量) 12
2.3. 備援頻道和浮現型頻道 12
2.4. 網際網路在災難中的角色 13
2.5. 過去文字訊息分類 13
第三章 系統架構與研究方法 15
3.1. 資料來源 16
3.2. 系統設計與概觀 17
3.3. 資料收集與儲存資料集 19
3.4. 資料前處理 20
3.5. 中文斷詞處理 21
3.6. 移除停用字 21
3.7. 專家文本分類 21
3.8. 機器學習 23
第四章 系統實作 33
4.1. 各資料來源前處理 33
4.2. 中文斷詞處理 38
4.3. 移除停用字 40
4.4. 專家分類使用者介面設計 41
4.5. 機器學習與分類器 44
第五章 實驗結果與分析 48
5.1. 頻率分析 48
5.2. 詞彙網絡分析 52
5.3. 機器學習比較 58
第六章 結論與未來研究 67
參考文獻 70
附件一、專家文本分類編碼表 73
附件二、中央研究院平衡語料庫詞頻統計 79
zh_TW
dc.format.extent 3227834 bytes-
dc.format.mimetype application/pdf-
dc.language.iso en_US-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0099753014en_US
dc.subject (關鍵詞) 自動化分類zh_TW
dc.subject (關鍵詞) 八八風災zh_TW
dc.subject (關鍵詞) 災難事件zh_TW
dc.subject (關鍵詞) 新媒體zh_TW
dc.subject (關鍵詞) Automated Classificationen_US
dc.subject (關鍵詞) Typhoon Morakoten_US
dc.subject (關鍵詞) Disaster Eventen_US
dc.subject (關鍵詞) New Mediaen_US
dc.title (題名) 災難事件下新媒體資訊傳播方式分析與自動化分類設計 ─ 以八八風災為例zh_TW
dc.title (題名) Information Transmission Analysis and Automated Classification Design for New Media in a Disaster Event – Case Study of Typhoon Morakoten_US
dc.type (資料類型) thesisen
dc.relation.reference (參考文獻) S. Bowman and C. Willis. (2003). We media: How audiences are shaping the future of news and information. Available: http://www.hypergene.net/wemedia/download/we_media.pdf
陳百齡 and 鄭宇君, "災難情境下的新興媒體:莫拉克風災中的浮現頻道," presented at the 中華傳播學會2011年會, 新竹,交通大學, 2011.
M. Morris and C. Ogan, "The Internet as mass medium," Journal of Communication, vol. 46, pp. 39-50, Win 1996.
孫式文, "網際網路在災難事件中的傳播功能:理論與實務的辯證," 新聞學研究, p. 25, 2002.04.
T. O`Reilly. (2005, 07.18). What Is Web 2.0. Available: http://oreilly.com/web2/archive/what-is-web-20.html
楊千慧. (1999, 07-25). 他山之石:日本阪神地震網路應用經驗分享. Available: http://www.find.org.tw/find/home.aspx?page=news&id=344
張士弘, "災害應變中心整體系統維運淺論," 國土資訊系統通訊, vol. 44, 2005.
E. L. Quarantelli. (1998). The Computer Based Information/Communication Revolution: A Dozen Problematical Issues And Questions They Raise For Disaster Planning And Managing. Available: http://udspace.udel.edu/handle/19716/659
J. Sutton, L. Palen, and I. Shklovski, "Backchannels on the Front Lines: Emergent Uses of Social Media in the 2007 Southern California Wildfires," in Proceedings of the 5th International ISCRAM Conference, Washington, DC, USA, 2008.
顧佳欣. (2009, 07.28). 莫拉克效應:災難傳播要善用資源. Available: http://www.feja.org.tw/modules/news007/article.php?storyid=395
孫式文, "網際網路在社會危機中的功能:網友調查研究," presented at the 2000網路與社會研討會, 新竹, 2000.
E. M. Rogers, Diffusion of Innovations. New York: The Free Press 1995.
L. Potts, "Peering into disaster: Social software use from the Indian Ocean earthquake to the Mumbai bombings," In Proceedings of the International Professional Communication Conference, Hawaii, 2009.
G. V. Cormack, J. M. G. Hidalgo, and E. P. Sanz, "Spam filtering for short messages," in Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, Lisbon, Portugal, 2007, pp. 313-320.
R. Gupta and L. Ratinov, "Text Categorization with Knowledge Transfer from Heterogeneous Data Sources," in Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, Chicago, 2008.
C. H. Brooks and N. Montanez, "Improved Annotation of the Blogopshere via Autotagging and Hierarchical Clustering," presented at the WWW2006 Conference, Edinburgh, UK, 2006.
R. Munro and C. D. Manning, "Subword Variation in Text Message Classification," presented at the Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, Los Angeles, California, 2010.
C. Caragea, N. McNeese, A. Jaiswal, G. Traylor, H.-W. Kim, P. Mitra, D. Wu, A. H. Tapia, L. Giles, B. J. Jansen, and J. Yen, "Classifying Text Messages for the Haiti Earthquake," in Proceedings of the 8th International ISCRAM Conference, Lisbon, Portugal, 2011.
香港警務處. (2008). 發展第三代重大事件調查及災難支援工作系統. Available: http://www.legco.gov.hk/yr07-08/chinese/fc/fc/papers/f08-23c.pdf
G. Salton and C. Buckley, "Term-Weighting Approaches in Automatic Text Retrieval," Information Processing & Management, vol. 24, pp. 513-523, 1988.
G. Salton, A. Wong, and C. S. Yang, "A vector space model for automatic indexing," Communications of the ACM, vol. 18, pp. 613-620, 1975.
Y. Yang and J. O. Pedersen, "A comparative study on feature selection in text categorization," in Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 412–420.
K.-B. Duan and S. S. Keerthi, "Which is the best multiclass SVM method? an empirical study," in Proceedings of the 6th international conference on Multiple Classifier Systems, Seaside, CA, 2005, pp. 278-285.
C.-H. Tsai. (2000). MMSEG: A Word Identification System for Mandarin Chinese Text Based on Two Variants of the Maximum Matching Algorithm. Available: http://technology.chtsai.org/mmseg/
國語辭典簡編本編輯小組. (1997). 國語辭典簡編本編輯資料字詞頻統計報告. Available: http://www.edu.tw/files/site_content/M0001/pin/f11.html
中華郵政有限公司. (2012, 06.10). 3+2碼郵遞區號Excel檔 101/05. Available: http://www.post.gov.tw/post/internet/down/index.html
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and É. Duchesnay, "Scikit-learn: Machine Learning in Python," JMLR, vol. 12, pp. 2825-2830, 2011.
zh_TW