Publications-Theses
Article View/Open
Publication Export
-
Google ScholarTM
NCCU Library
Citation Infomation
Related Publications in TAIR
題名 災難事件下新媒體資訊傳播方式分析與自動化分類設計 ─ 以八八風災為例
Information Transmission Analysis and Automated Classification Design for New Media in a Disaster Event – Case Study of Typhoon Morakot作者 施旭峰
Shih, Shiuh Feng貢獻者 李蔡彥
Li, Tsai Yen
施旭峰
Shih, Shiuh Feng關鍵詞 自動化分類
八八風災
災難事件
新媒體
Automated Classification
Typhoon Morakot
Disaster Event
New Media日期 2013 上傳時間 1-Nov-2013 11:43:41 (UTC+8) 摘要 災難事件發生時,災難資訊的分析和傳遞需具有即時性,才能讓資訊運用達到防災與救災的目的。網路基礎設施普及後,災難資訊的提供者加入廣大的網路公眾媒體,單獨透過搜尋引擎檢索無法即時的反應災難目前狀態;而像災難應變中心這類傳統頻道的災難通報管道有限,經常無法負荷突然爆發的資訊。這些因災難爆發的瞬間巨量資料,已無法完全使用人力蒐集、過濾與處理,需要發展新的工具能夠快速的自動化分類新媒體頻道資訊,提供救災防災體系應變或政府決策時參考。本研究收集莫拉克颱風八八水災期間五個頻道資料,經過文字處理與專家分類後,由頻率分布、分類結構組成與詞彙共現網絡,觀察不同頻道資料集之性質的異同。在未考慮詞性與文法的狀況下,使用向量空間模型訓練OAO-SVM分類器模型,評估自動化分類方式的績效。根據分析結果我們發現災難發生後,網路上的資訊隨著時序存在著階段性的期程,能夠由各個頻道瞭解災難的進程。透過詞彙共現網絡,瞭解救難專家書寫相較於俗民書寫使用的詞彙少重複且異質性較高。使用OAO-SVM訓練分類器結果,救難專家書寫的頻道分類績效優於俗民書寫。分類器交叉比較後,對於同性質頻道的內容具有較好的分類績效。透過合併相同屬性資料集訓練,我們發現當訓練資料的品質夠好時,分類器能夠有不錯的分類績效。品質不夠時,可以經由增加訓練資料的數量來提升分類的績效。本研究的歸納,以及所發展出來的分類方式與資訊探索技術,未來可以用於開發更有效率且精確的社群感知器。
When disaster events occur, information diffusion and transmission need to be in real-time in order to exploit the information in disaster prevention and recovery. With the establishment of network infrastructure, mass media also joins the role of information providers of disaster events on the internet. However retrieved information through search engines often cannot reflect the status of a progressing disaster. Traditional channels such as disaster reaction centers also have difficulty handling the inpour of disaster information, and which is usually beyond the ability of human processing. Thus there is a need to develop new tools to quickly automate classification of information from new media, to provide reliable information to disaster reaction centers, and assist policy decision-making.In this study, we use the data during typhoon Morakot collected from five different channels. After word processing and content classification by experts, we observe the difference between these datasets by the frequency distribution, classification structures and word co-occurrence network. We use the vector space model to train the OAO-SVM classification model without considering speech and grammar, and evaluate the performance of automated classification.From the results, we found that the chronology of internet data can identify a number of stages throughout the progression of disasters, allowing us to oversee the development of the disaster through each channel. Through word relation in word co-occurrence network, experts use fewer repeating words and high heterogeneity than amateur writing channels. The training results of classifier from the OAO-SVM model indicate that channels maintained by experts perform better than amateur writing. The cross compare classifier has better performance for channels with the same properties. When we merge the same property channel dataset to train classifier, we found that when the training data quality is good enough, the classifier can have a good performance. If the data quality is not enough, you can increase amount of training data to improve classification performance. As a contribution of this research, we believe the techniques developed and results of the analysis can be used to design more efficient and accurate social sensors in the future.參考文獻 S. Bowman and C. Willis. (2003). We media: How audiences are shaping the future of news and information. Available: http://www.hypergene.net/wemedia/download/we_media.pdf陳百齡 and 鄭宇君, "災難情境下的新興媒體:莫拉克風災中的浮現頻道," presented at the 中華傳播學會2011年會, 新竹,交通大學, 2011.M. Morris and C. Ogan, "The Internet as mass medium," Journal of Communication, vol. 46, pp. 39-50, Win 1996.孫式文, "網際網路在災難事件中的傳播功能:理論與實務的辯證," 新聞學研究, p. 25, 2002.04.T. O`Reilly. (2005, 07.18). What Is Web 2.0. Available: http://oreilly.com/web2/archive/what-is-web-20.html楊千慧. (1999, 07-25). 他山之石:日本阪神地震網路應用經驗分享. Available: http://www.find.org.tw/find/home.aspx?page=news&id=344張士弘, "災害應變中心整體系統維運淺論," 國土資訊系統通訊, vol. 44, 2005.E. L. Quarantelli. (1998). The Computer Based Information/Communication Revolution: A Dozen Problematical Issues And Questions They Raise For Disaster Planning And Managing. Available: http://udspace.udel.edu/handle/19716/659J. Sutton, L. Palen, and I. Shklovski, "Backchannels on the Front Lines: Emergent Uses of Social Media in the 2007 Southern California Wildfires," in Proceedings of the 5th International ISCRAM Conference, Washington, DC, USA, 2008.顧佳欣. (2009, 07.28). 莫拉克效應:災難傳播要善用資源. Available: http://www.feja.org.tw/modules/news007/article.php?storyid=395孫式文, "網際網路在社會危機中的功能:網友調查研究," presented at the 2000網路與社會研討會, 新竹, 2000.E. M. Rogers, Diffusion of Innovations. New York: The Free Press 1995.L. Potts, "Peering into disaster: Social software use from the Indian Ocean earthquake to the Mumbai bombings," In Proceedings of the International Professional Communication Conference, Hawaii, 2009.G. V. Cormack, J. M. G. Hidalgo, and E. P. Sanz, "Spam filtering for short messages," in Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, Lisbon, Portugal, 2007, pp. 313-320.R. Gupta and L. Ratinov, "Text Categorization with Knowledge Transfer from Heterogeneous Data Sources," in Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, Chicago, 2008.C. H. Brooks and N. Montanez, "Improved Annotation of the Blogopshere via Autotagging and Hierarchical Clustering," presented at the WWW2006 Conference, Edinburgh, UK, 2006.R. Munro and C. D. Manning, "Subword Variation in Text Message Classification," presented at the Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, Los Angeles, California, 2010.C. Caragea, N. McNeese, A. Jaiswal, G. Traylor, H.-W. Kim, P. Mitra, D. Wu, A. H. Tapia, L. Giles, B. J. Jansen, and J. Yen, "Classifying Text Messages for the Haiti Earthquake," in Proceedings of the 8th International ISCRAM Conference, Lisbon, Portugal, 2011.香港警務處. (2008). 發展第三代重大事件調查及災難支援工作系統. Available: http://www.legco.gov.hk/yr07-08/chinese/fc/fc/papers/f08-23c.pdfG. Salton and C. Buckley, "Term-Weighting Approaches in Automatic Text Retrieval," Information Processing & Management, vol. 24, pp. 513-523, 1988.G. Salton, A. Wong, and C. S. Yang, "A vector space model for automatic indexing," Communications of the ACM, vol. 18, pp. 613-620, 1975.Y. Yang and J. O. Pedersen, "A comparative study on feature selection in text categorization," in Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 412–420.K.-B. Duan and S. S. Keerthi, "Which is the best multiclass SVM method? an empirical study," in Proceedings of the 6th international conference on Multiple Classifier Systems, Seaside, CA, 2005, pp. 278-285.C.-H. Tsai. (2000). MMSEG: A Word Identification System for Mandarin Chinese Text Based on Two Variants of the Maximum Matching Algorithm. Available: http://technology.chtsai.org/mmseg/國語辭典簡編本編輯小組. (1997). 國語辭典簡編本編輯資料字詞頻統計報告. Available: http://www.edu.tw/files/site_content/M0001/pin/f11.html中華郵政有限公司. (2012, 06.10). 3+2碼郵遞區號Excel檔 101/05. Available: http://www.post.gov.tw/post/internet/down/index.htmlF. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and É. Duchesnay, "Scikit-learn: Machine Learning in Python," JMLR, vol. 12, pp. 2825-2830, 2011. 描述 碩士
國立政治大學
資訊科學學系
99753014
102資料來源 http://thesis.lib.nccu.edu.tw/record/#G0099753014 資料類型 thesis dc.contributor.advisor 李蔡彥 zh_TW dc.contributor.advisor Li, Tsai Yen en_US dc.contributor.author (Authors) 施旭峰 zh_TW dc.contributor.author (Authors) Shih, Shiuh Feng en_US dc.creator (作者) 施旭峰 zh_TW dc.creator (作者) Shih, Shiuh Feng en_US dc.date (日期) 2013 en_US dc.date.accessioned 1-Nov-2013 11:43:41 (UTC+8) - dc.date.available 1-Nov-2013 11:43:41 (UTC+8) - dc.date.issued (上傳時間) 1-Nov-2013 11:43:41 (UTC+8) - dc.identifier (Other Identifiers) G0099753014 en_US dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/61489 - dc.description (描述) 碩士 zh_TW dc.description (描述) 國立政治大學 zh_TW dc.description (描述) 資訊科學學系 zh_TW dc.description (描述) 99753014 zh_TW dc.description (描述) 102 zh_TW dc.description.abstract (摘要) 災難事件發生時,災難資訊的分析和傳遞需具有即時性,才能讓資訊運用達到防災與救災的目的。網路基礎設施普及後,災難資訊的提供者加入廣大的網路公眾媒體,單獨透過搜尋引擎檢索無法即時的反應災難目前狀態;而像災難應變中心這類傳統頻道的災難通報管道有限,經常無法負荷突然爆發的資訊。這些因災難爆發的瞬間巨量資料,已無法完全使用人力蒐集、過濾與處理,需要發展新的工具能夠快速的自動化分類新媒體頻道資訊,提供救災防災體系應變或政府決策時參考。本研究收集莫拉克颱風八八水災期間五個頻道資料,經過文字處理與專家分類後,由頻率分布、分類結構組成與詞彙共現網絡,觀察不同頻道資料集之性質的異同。在未考慮詞性與文法的狀況下,使用向量空間模型訓練OAO-SVM分類器模型,評估自動化分類方式的績效。根據分析結果我們發現災難發生後,網路上的資訊隨著時序存在著階段性的期程,能夠由各個頻道瞭解災難的進程。透過詞彙共現網絡,瞭解救難專家書寫相較於俗民書寫使用的詞彙少重複且異質性較高。使用OAO-SVM訓練分類器結果,救難專家書寫的頻道分類績效優於俗民書寫。分類器交叉比較後,對於同性質頻道的內容具有較好的分類績效。透過合併相同屬性資料集訓練,我們發現當訓練資料的品質夠好時,分類器能夠有不錯的分類績效。品質不夠時,可以經由增加訓練資料的數量來提升分類的績效。本研究的歸納,以及所發展出來的分類方式與資訊探索技術,未來可以用於開發更有效率且精確的社群感知器。 zh_TW dc.description.abstract (摘要) When disaster events occur, information diffusion and transmission need to be in real-time in order to exploit the information in disaster prevention and recovery. With the establishment of network infrastructure, mass media also joins the role of information providers of disaster events on the internet. However retrieved information through search engines often cannot reflect the status of a progressing disaster. Traditional channels such as disaster reaction centers also have difficulty handling the inpour of disaster information, and which is usually beyond the ability of human processing. Thus there is a need to develop new tools to quickly automate classification of information from new media, to provide reliable information to disaster reaction centers, and assist policy decision-making.In this study, we use the data during typhoon Morakot collected from five different channels. After word processing and content classification by experts, we observe the difference between these datasets by the frequency distribution, classification structures and word co-occurrence network. We use the vector space model to train the OAO-SVM classification model without considering speech and grammar, and evaluate the performance of automated classification.From the results, we found that the chronology of internet data can identify a number of stages throughout the progression of disasters, allowing us to oversee the development of the disaster through each channel. Through word relation in word co-occurrence network, experts use fewer repeating words and high heterogeneity than amateur writing channels. The training results of classifier from the OAO-SVM model indicate that channels maintained by experts perform better than amateur writing. The cross compare classifier has better performance for channels with the same properties. When we merge the same property channel dataset to train classifier, we found that when the training data quality is good enough, the classifier can have a good performance. If the data quality is not enough, you can increase amount of training data to improve classification performance. As a contribution of this research, we believe the techniques developed and results of the analysis can be used to design more efficient and accurate social sensors in the future. en_US dc.description.tableofcontents 第一章 導論 51.1. 研究動機 51.2. 問題描述 91.3. 研究目的 101.4. 預期貢獻 10第二章 相關研究 112.1. 災難期間的傳播活動 112.2. 災難發生後資訊傳達問題(瞬間巨量) 122.3. 備援頻道和浮現型頻道 122.4. 網際網路在災難中的角色 132.5. 過去文字訊息分類 13第三章 系統架構與研究方法 153.1. 資料來源 163.2. 系統設計與概觀 173.3. 資料收集與儲存資料集 193.4. 資料前處理 203.5. 中文斷詞處理 213.6. 移除停用字 213.7. 專家文本分類 213.8. 機器學習 23第四章 系統實作 334.1. 各資料來源前處理 334.2. 中文斷詞處理 384.3. 移除停用字 404.4. 專家分類使用者介面設計 414.5. 機器學習與分類器 44第五章 實驗結果與分析 485.1. 頻率分析 485.2. 詞彙網絡分析 525.3. 機器學習比較 58第六章 結論與未來研究 67參考文獻 70附件一、專家文本分類編碼表 73附件二、中央研究院平衡語料庫詞頻統計 79 zh_TW dc.format.extent 3227834 bytes - dc.format.mimetype application/pdf - dc.language.iso en_US - dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0099753014 en_US dc.subject (關鍵詞) 自動化分類 zh_TW dc.subject (關鍵詞) 八八風災 zh_TW dc.subject (關鍵詞) 災難事件 zh_TW dc.subject (關鍵詞) 新媒體 zh_TW dc.subject (關鍵詞) Automated Classification en_US dc.subject (關鍵詞) Typhoon Morakot en_US dc.subject (關鍵詞) Disaster Event en_US dc.subject (關鍵詞) New Media en_US dc.title (題名) 災難事件下新媒體資訊傳播方式分析與自動化分類設計 ─ 以八八風災為例 zh_TW dc.title (題名) Information Transmission Analysis and Automated Classification Design for New Media in a Disaster Event – Case Study of Typhoon Morakot en_US dc.type (資料類型) thesis en dc.relation.reference (參考文獻) S. Bowman and C. Willis. (2003). We media: How audiences are shaping the future of news and information. Available: http://www.hypergene.net/wemedia/download/we_media.pdf陳百齡 and 鄭宇君, "災難情境下的新興媒體:莫拉克風災中的浮現頻道," presented at the 中華傳播學會2011年會, 新竹,交通大學, 2011.M. Morris and C. Ogan, "The Internet as mass medium," Journal of Communication, vol. 46, pp. 39-50, Win 1996.孫式文, "網際網路在災難事件中的傳播功能:理論與實務的辯證," 新聞學研究, p. 25, 2002.04.T. O`Reilly. (2005, 07.18). What Is Web 2.0. Available: http://oreilly.com/web2/archive/what-is-web-20.html楊千慧. (1999, 07-25). 他山之石:日本阪神地震網路應用經驗分享. Available: http://www.find.org.tw/find/home.aspx?page=news&id=344張士弘, "災害應變中心整體系統維運淺論," 國土資訊系統通訊, vol. 44, 2005.E. L. Quarantelli. (1998). The Computer Based Information/Communication Revolution: A Dozen Problematical Issues And Questions They Raise For Disaster Planning And Managing. Available: http://udspace.udel.edu/handle/19716/659J. Sutton, L. Palen, and I. Shklovski, "Backchannels on the Front Lines: Emergent Uses of Social Media in the 2007 Southern California Wildfires," in Proceedings of the 5th International ISCRAM Conference, Washington, DC, USA, 2008.顧佳欣. (2009, 07.28). 莫拉克效應:災難傳播要善用資源. Available: http://www.feja.org.tw/modules/news007/article.php?storyid=395孫式文, "網際網路在社會危機中的功能:網友調查研究," presented at the 2000網路與社會研討會, 新竹, 2000.E. M. Rogers, Diffusion of Innovations. New York: The Free Press 1995.L. Potts, "Peering into disaster: Social software use from the Indian Ocean earthquake to the Mumbai bombings," In Proceedings of the International Professional Communication Conference, Hawaii, 2009.G. V. Cormack, J. M. G. Hidalgo, and E. P. Sanz, "Spam filtering for short messages," in Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, Lisbon, Portugal, 2007, pp. 313-320.R. Gupta and L. Ratinov, "Text Categorization with Knowledge Transfer from Heterogeneous Data Sources," in Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, Chicago, 2008.C. H. Brooks and N. Montanez, "Improved Annotation of the Blogopshere via Autotagging and Hierarchical Clustering," presented at the WWW2006 Conference, Edinburgh, UK, 2006.R. Munro and C. D. Manning, "Subword Variation in Text Message Classification," presented at the Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, Los Angeles, California, 2010.C. Caragea, N. McNeese, A. Jaiswal, G. Traylor, H.-W. Kim, P. Mitra, D. Wu, A. H. Tapia, L. Giles, B. J. Jansen, and J. Yen, "Classifying Text Messages for the Haiti Earthquake," in Proceedings of the 8th International ISCRAM Conference, Lisbon, Portugal, 2011.香港警務處. (2008). 發展第三代重大事件調查及災難支援工作系統. Available: http://www.legco.gov.hk/yr07-08/chinese/fc/fc/papers/f08-23c.pdfG. Salton and C. Buckley, "Term-Weighting Approaches in Automatic Text Retrieval," Information Processing & Management, vol. 24, pp. 513-523, 1988.G. Salton, A. Wong, and C. S. Yang, "A vector space model for automatic indexing," Communications of the ACM, vol. 18, pp. 613-620, 1975.Y. Yang and J. O. Pedersen, "A comparative study on feature selection in text categorization," in Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 412–420.K.-B. Duan and S. S. Keerthi, "Which is the best multiclass SVM method? an empirical study," in Proceedings of the 6th international conference on Multiple Classifier Systems, Seaside, CA, 2005, pp. 278-285.C.-H. Tsai. (2000). MMSEG: A Word Identification System for Mandarin Chinese Text Based on Two Variants of the Maximum Matching Algorithm. Available: http://technology.chtsai.org/mmseg/國語辭典簡編本編輯小組. (1997). 國語辭典簡編本編輯資料字詞頻統計報告. Available: http://www.edu.tw/files/site_content/M0001/pin/f11.html中華郵政有限公司. (2012, 06.10). 3+2碼郵遞區號Excel檔 101/05. Available: http://www.post.gov.tw/post/internet/down/index.htmlF. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and É. Duchesnay, "Scikit-learn: Machine Learning in Python," JMLR, vol. 12, pp. 2825-2830, 2011. zh_TW