災難事件下新媒體資訊傳播方式分析與自動化分類設計 ─ 以八八風災為例

Publications-Theses

Article View/Open

pdf(1220)

Publication Export

Google Scholar^TM

題名	災難事件下新媒體資訊傳播方式分析與自動化分類設計 ─ 以八八風災為例 Information Transmission Analysis and Automated Classification Design for New Media in a Disaster Event – Case Study of Typhoon Morakot
作者	施旭峰 Shih, Shiuh Feng
貢獻者	李蔡彥 Li, Tsai Yen 施旭峰 Shih, Shiuh Feng
關鍵詞	自動化分類八八風災災難事件新媒體 Automated Classification Typhoon Morakot Disaster Event New Media
日期	2013
上傳時間	1-Nov-2013 11:43:41 (UTC+8)
摘要	災難事件發生時，災難資訊的分析和傳遞需具有即時性，才能讓資訊運用達到防災與救災的目的。網路基礎設施普及後，災難資訊的提供者加入廣大的網路公眾媒體，單獨透過搜尋引擎檢索無法即時的反應災難目前狀態；而像災難應變中心這類傳統頻道的災難通報管道有限，經常無法負荷突然爆發的資訊。這些因災難爆發的瞬間巨量資料，已無法完全使用人力蒐集、過濾與處理，需要發展新的工具能夠快速的自動化分類新媒體頻道資訊，提供救災防災體系應變或政府決策時參考。本研究收集莫拉克颱風八八水災期間五個頻道資料，經過文字處理與專家分類後，由頻率分布、分類結構組成與詞彙共現網絡，觀察不同頻道資料集之性質的異同。在未考慮詞性與文法的狀況下，使用向量空間模型訓練OAO-SVM分類器模型，評估自動化分類方式的績效。根據分析結果我們發現災難發生後，網路上的資訊隨著時序存在著階段性的期程，能夠由各個頻道瞭解災難的進程。透過詞彙共現網絡，瞭解救難專家書寫相較於俗民書寫使用的詞彙少重複且異質性較高。使用OAO-SVM訓練分類器結果，救難專家書寫的頻道分類績效優於俗民書寫。分類器交叉比較後，對於同性質頻道的內容具有較好的分類績效。透過合併相同屬性資料集訓練，我們發現當訓練資料的品質夠好時，分類器能夠有不錯的分類績效。品質不夠時，可以經由增加訓練資料的數量來提升分類的績效。本研究的歸納，以及所發展出來的分類方式與資訊探索技術，未來可以用於開發更有效率且精確的社群感知器。 When disaster events occur, information diffusion and transmission need to be in real-time in order to exploit the information in disaster prevention and recovery. With the establishment of network infrastructure, mass media also joins the role of information providers of disaster events on the internet. However retrieved information through search engines often cannot reflect the status of a progressing disaster. Traditional channels such as disaster reaction centers also have difficulty handling the inpour of disaster information, and which is usually beyond the ability of human processing. Thus there is a need to develop new tools to quickly automate classification of information from new media, to provide reliable information to disaster reaction centers, and assist policy decision-making. In this study, we use the data during typhoon Morakot collected from five different channels. After word processing and content classification by experts, we observe the difference between these datasets by the frequency distribution, classification structures and word co-occurrence network. We use the vector space model to train the OAO-SVM classification model without considering speech and grammar, and evaluate the performance of automated classification. From the results, we found that the chronology of internet data can identify a number of stages throughout the progression of disasters, allowing us to oversee the development of the disaster through each channel. Through word relation in word co-occurrence network, experts use fewer repeating words and high heterogeneity than amateur writing channels. The training results of classifier from the OAO-SVM model indicate that channels maintained by experts perform better than amateur writing. The cross compare classifier has better performance for channels with the same properties. When we merge the same property channel dataset to train classifier, we found that when the training data quality is good enough, the classifier can have a good performance. If the data quality is not enough, you can increase amount of training data to improve classification performance. As a contribution of this research, we believe the techniques developed and results of the analysis can be used to design more efficient and accurate social sensors in the future.
參考文獻	S. Bowman and C. Willis. (2003). We media: How audiences are shaping the future of news and information. Available: http://www.hypergene.net/wemedia/download/we_media.pdf 陳百齡 and 鄭宇君, "災難情境下的新興媒體：莫拉克風災中的浮現頻道," presented at the 中華傳播學會2011年會, 新竹，交通大學, 2011. M. Morris and C. Ogan, "The Internet as mass medium," Journal of Communication, vol. 46, pp. 39-50, Win 1996. 孫式文, "網際網路在災難事件中的傳播功能：理論與實務的辯證," 新聞學研究, p. 25, 2002.04. T. O`Reilly. (2005, 07.18). What Is Web 2.0. Available: http://oreilly.com/web2/archive/what-is-web-20.html 楊千慧. (1999, 07-25). 他山之石：日本阪神地震網路應用經驗分享. Available: http://www.find.org.tw/find/home.aspx?page=news&id=344 張士弘, "災害應變中心整體系統維運淺論," 國土資訊系統通訊, vol. 44, 2005. E. L. Quarantelli. (1998). The Computer Based Information/Communication Revolution: A Dozen Problematical Issues And Questions They Raise For Disaster Planning And Managing. Available: http://udspace.udel.edu/handle/19716/659 J. Sutton, L. Palen, and I. Shklovski, "Backchannels on the Front Lines: Emergent Uses of Social Media in the 2007 Southern California Wildfires," in Proceedings of the 5th International ISCRAM Conference, Washington, DC, USA, 2008. 顧佳欣. (2009, 07.28). 莫拉克效應：災難傳播要善用資源. Available: http://www.feja.org.tw/modules/news007/article.php?storyid=395 孫式文, "網際網路在社會危機中的功能：網友調查研究," presented at the 2000網路與社會研討會, 新竹, 2000. E. M. Rogers, Diffusion of Innovations. New York: The Free Press 1995. L. Potts, "Peering into disaster: Social software use from the Indian Ocean earthquake to the Mumbai bombings," In Proceedings of the International Professional Communication Conference, Hawaii, 2009. G. V. Cormack, J. M. G. Hidalgo, and E. P. Sanz, "Spam filtering for short messages," in Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, Lisbon, Portugal, 2007, pp. 313-320. R. Gupta and L. Ratinov, "Text Categorization with Knowledge Transfer from Heterogeneous Data Sources," in Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, Chicago, 2008. C. H. Brooks and N. Montanez, "Improved Annotation of the Blogopshere via Autotagging and Hierarchical Clustering," presented at the WWW2006 Conference, Edinburgh, UK, 2006. R. Munro and C. D. Manning, "Subword Variation in Text Message Classiﬁcation," presented at the Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, Los Angeles, California, 2010. C. Caragea, N. McNeese, A. Jaiswal, G. Traylor, H.-W. Kim, P. Mitra, D. Wu, A. H. Tapia, L. Giles, B. J. Jansen, and J. Yen, "Classifying Text Messages for the Haiti Earthquake," in Proceedings of the 8th International ISCRAM Conference, Lisbon, Portugal, 2011. 香港警務處. (2008). 發展第三代重大事件調查及災難支援工作系統. Available: http://www.legco.gov.hk/yr07-08/chinese/fc/fc/papers/f08-23c.pdf G. Salton and C. Buckley, "Term-Weighting Approaches in Automatic Text Retrieval," Information Processing & Management, vol. 24, pp. 513-523, 1988. G. Salton, A. Wong, and C. S. Yang, "A vector space model for automatic indexing," Communications of the ACM, vol. 18, pp. 613-620, 1975. Y. Yang and J. O. Pedersen, "A comparative study on feature selection in text categorization," in Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 412–420. K.-B. Duan and S. S. Keerthi, "Which is the best multiclass SVM method? an empirical study," in Proceedings of the 6th international conference on Multiple Classifier Systems, Seaside, CA, 2005, pp. 278-285. C.-H. Tsai. (2000). MMSEG: A Word Identification System for Mandarin Chinese Text Based on Two Variants of the Maximum Matching Algorithm. Available: http://technology.chtsai.org/mmseg/ 國語辭典簡編本編輯小組. (1997). 國語辭典簡編本編輯資料字詞頻統計報告. Available: http://www.edu.tw/files/site_content/M0001/pin/f11.html 中華郵政有限公司. (2012, 06.10). 3+2碼郵遞區號Excel檔 101/05. Available: http://www.post.gov.tw/post/internet/down/index.html F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and É. Duchesnay, "Scikit-learn: Machine Learning in Python," JMLR, vol. 12, pp. 2825-2830, 2011.
描述	碩士國立政治大學資訊科學學系 99753014 102
資料來源	http://thesis.lib.nccu.edu.tw/record/#G0099753014
資料類型	thesis

dc.contributor.advisor	李蔡彥	zh_TW
dc.contributor.advisor	Li, Tsai Yen	en_US
dc.contributor.author (Authors)	施旭峰	zh_TW
dc.contributor.author (Authors)	Shih, Shiuh Feng	en_US
dc.creator (作者)	施旭峰	zh_TW
dc.creator (作者)	Shih, Shiuh Feng	en_US
dc.date (日期)	2013	en_US
dc.date.accessioned	1-Nov-2013 11:43:41 (UTC+8)	-
dc.date.available	1-Nov-2013 11:43:41 (UTC+8)	-
dc.date.issued (上傳時間)	1-Nov-2013 11:43:41 (UTC+8)	-
dc.identifier (Other Identifiers)	G0099753014	en_US
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/61489	-
dc.description (描述)	碩士	zh_TW
dc.description (描述)	國立政治大學	zh_TW
dc.description (描述)	資訊科學學系	zh_TW
dc.description (描述)	99753014	zh_TW
dc.description (描述)	102	zh_TW
dc.description.abstract (摘要)	災難事件發生時，災難資訊的分析和傳遞需具有即時性，才能讓資訊運用達到防災與救災的目的。網路基礎設施普及後，災難資訊的提供者加入廣大的網路公眾媒體，單獨透過搜尋引擎檢索無法即時的反應災難目前狀態；而像災難應變中心這類傳統頻道的災難通報管道有限，經常無法負荷突然爆發的資訊。這些因災難爆發的瞬間巨量資料，已無法完全使用人力蒐集、過濾與處理，需要發展新的工具能夠快速的自動化分類新媒體頻道資訊，提供救災防災體系應變或政府決策時參考。本研究收集莫拉克颱風八八水災期間五個頻道資料，經過文字處理與專家分類後，由頻率分布、分類結構組成與詞彙共現網絡，觀察不同頻道資料集之性質的異同。在未考慮詞性與文法的狀況下，使用向量空間模型訓練OAO-SVM分類器模型，評估自動化分類方式的績效。根據分析結果我們發現災難發生後，網路上的資訊隨著時序存在著階段性的期程，能夠由各個頻道瞭解災難的進程。透過詞彙共現網絡，瞭解救難專家書寫相較於俗民書寫使用的詞彙少重複且異質性較高。使用OAO-SVM訓練分類器結果，救難專家書寫的頻道分類績效優於俗民書寫。分類器交叉比較後，對於同性質頻道的內容具有較好的分類績效。透過合併相同屬性資料集訓練，我們發現當訓練資料的品質夠好時，分類器能夠有不錯的分類績效。品質不夠時，可以經由增加訓練資料的數量來提升分類的績效。本研究的歸納，以及所發展出來的分類方式與資訊探索技術，未來可以用於開發更有效率且精確的社群感知器。	zh_TW
dc.description.abstract (摘要)	When disaster events occur, information diffusion and transmission need to be in real-time in order to exploit the information in disaster prevention and recovery. With the establishment of network infrastructure, mass media also joins the role of information providers of disaster events on the internet. However retrieved information through search engines often cannot reflect the status of a progressing disaster. Traditional channels such as disaster reaction centers also have difficulty handling the inpour of disaster information, and which is usually beyond the ability of human processing. Thus there is a need to develop new tools to quickly automate classification of information from new media, to provide reliable information to disaster reaction centers, and assist policy decision-making. In this study, we use the data during typhoon Morakot collected from five different channels. After word processing and content classification by experts, we observe the difference between these datasets by the frequency distribution, classification structures and word co-occurrence network. We use the vector space model to train the OAO-SVM classification model without considering speech and grammar, and evaluate the performance of automated classification. From the results, we found that the chronology of internet data can identify a number of stages throughout the progression of disasters, allowing us to oversee the development of the disaster through each channel. Through word relation in word co-occurrence network, experts use fewer repeating words and high heterogeneity than amateur writing channels. The training results of classifier from the OAO-SVM model indicate that channels maintained by experts perform better than amateur writing. The cross compare classifier has better performance for channels with the same properties. When we merge the same property channel dataset to train classifier, we found that when the training data quality is good enough, the classifier can have a good performance. If the data quality is not enough, you can increase amount of training data to improve classification performance. As a contribution of this research, we believe the techniques developed and results of the analysis can be used to design more efficient and accurate social sensors in the future.	en_US
dc.description.tableofcontents	第一章導論 5 1.1. 研究動機 5 1.2. 問題描述 9 1.3. 研究目的 10 1.4. 預期貢獻 10 第二章相關研究 11 2.1. 災難期間的傳播活動 11 2.2. 災難發生後資訊傳達問題（瞬間巨量） 12 2.3. 備援頻道和浮現型頻道 12 2.4. 網際網路在災難中的角色 13 2.5. 過去文字訊息分類 13 第三章系統架構與研究方法 15 3.1. 資料來源 16 3.2. 系統設計與概觀 17 3.3. 資料收集與儲存資料集 19 3.4. 資料前處理 20 3.5. 中文斷詞處理 21 3.6. 移除停用字 21 3.7. 專家文本分類 21 3.8. 機器學習 23 第四章系統實作 33 4.1. 各資料來源前處理 33 4.2. 中文斷詞處理 38 4.3. 移除停用字 40 4.4. 專家分類使用者介面設計 41 4.5. 機器學習與分類器 44 第五章實驗結果與分析 48 5.1. 頻率分析 48 5.2. 詞彙網絡分析 52 5.3. 機器學習比較 58 第六章結論與未來研究 67 參考文獻 70 附件一、專家文本分類編碼表 73 附件二、中央研究院平衡語料庫詞頻統計 79	zh_TW
dc.format.extent	3227834 bytes	-
dc.format.mimetype	application/pdf	-
dc.language.iso	en_US	-
dc.source.uri (資料來源)	http://thesis.lib.nccu.edu.tw/record/#G0099753014	en_US
dc.subject (關鍵詞)	自動化分類	zh_TW
dc.subject (關鍵詞)	八八風災	zh_TW
dc.subject (關鍵詞)	災難事件	zh_TW
dc.subject (關鍵詞)	新媒體	zh_TW
dc.subject (關鍵詞)	Automated Classification	en_US
dc.subject (關鍵詞)	Typhoon Morakot	en_US
dc.subject (關鍵詞)	Disaster Event	en_US
dc.subject (關鍵詞)	New Media	en_US
dc.title (題名)	災難事件下新媒體資訊傳播方式分析與自動化分類設計 ─ 以八八風災為例	zh_TW
dc.title (題名)	Information Transmission Analysis and Automated Classification Design for New Media in a Disaster Event – Case Study of Typhoon Morakot	en_US
dc.type (資料類型)	thesis	en
dc.relation.reference (參考文獻)	S. Bowman and C. Willis. (2003). We media: How audiences are shaping the future of news and information. Available: http://www.hypergene.net/wemedia/download/we_media.pdf 陳百齡 and 鄭宇君, "災難情境下的新興媒體：莫拉克風災中的浮現頻道," presented at the 中華傳播學會2011年會, 新竹，交通大學, 2011. M. Morris and C. Ogan, "The Internet as mass medium," Journal of Communication, vol. 46, pp. 39-50, Win 1996. 孫式文, "網際網路在災難事件中的傳播功能：理論與實務的辯證," 新聞學研究, p. 25, 2002.04. T. O`Reilly. (2005, 07.18). What Is Web 2.0. Available: http://oreilly.com/web2/archive/what-is-web-20.html 楊千慧. (1999, 07-25). 他山之石：日本阪神地震網路應用經驗分享. Available: http://www.find.org.tw/find/home.aspx?page=news&id=344 張士弘, "災害應變中心整體系統維運淺論," 國土資訊系統通訊, vol. 44, 2005. E. L. Quarantelli. (1998). The Computer Based Information/Communication Revolution: A Dozen Problematical Issues And Questions They Raise For Disaster Planning And Managing. Available: http://udspace.udel.edu/handle/19716/659 J. Sutton, L. Palen, and I. Shklovski, "Backchannels on the Front Lines: Emergent Uses of Social Media in the 2007 Southern California Wildfires," in Proceedings of the 5th International ISCRAM Conference, Washington, DC, USA, 2008. 顧佳欣. (2009, 07.28). 莫拉克效應：災難傳播要善用資源. Available: http://www.feja.org.tw/modules/news007/article.php?storyid=395 孫式文, "網際網路在社會危機中的功能：網友調查研究," presented at the 2000網路與社會研討會, 新竹, 2000. E. M. Rogers, Diffusion of Innovations. New York: The Free Press 1995. L. Potts, "Peering into disaster: Social software use from the Indian Ocean earthquake to the Mumbai bombings," In Proceedings of the International Professional Communication Conference, Hawaii, 2009. G. V. Cormack, J. M. G. Hidalgo, and E. P. Sanz, "Spam filtering for short messages," in Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, Lisbon, Portugal, 2007, pp. 313-320. R. Gupta and L. Ratinov, "Text Categorization with Knowledge Transfer from Heterogeneous Data Sources," in Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, Chicago, 2008. C. H. Brooks and N. Montanez, "Improved Annotation of the Blogopshere via Autotagging and Hierarchical Clustering," presented at the WWW2006 Conference, Edinburgh, UK, 2006. R. Munro and C. D. Manning, "Subword Variation in Text Message Classiﬁcation," presented at the Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, Los Angeles, California, 2010. C. Caragea, N. McNeese, A. Jaiswal, G. Traylor, H.-W. Kim, P. Mitra, D. Wu, A. H. Tapia, L. Giles, B. J. Jansen, and J. Yen, "Classifying Text Messages for the Haiti Earthquake," in Proceedings of the 8th International ISCRAM Conference, Lisbon, Portugal, 2011. 香港警務處. (2008). 發展第三代重大事件調查及災難支援工作系統. Available: http://www.legco.gov.hk/yr07-08/chinese/fc/fc/papers/f08-23c.pdf G. Salton and C. Buckley, "Term-Weighting Approaches in Automatic Text Retrieval," Information Processing & Management, vol. 24, pp. 513-523, 1988. G. Salton, A. Wong, and C. S. Yang, "A vector space model for automatic indexing," Communications of the ACM, vol. 18, pp. 613-620, 1975. Y. Yang and J. O. Pedersen, "A comparative study on feature selection in text categorization," in Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 412–420. K.-B. Duan and S. S. Keerthi, "Which is the best multiclass SVM method? an empirical study," in Proceedings of the 6th international conference on Multiple Classifier Systems, Seaside, CA, 2005, pp. 278-285. C.-H. Tsai. (2000). MMSEG: A Word Identification System for Mandarin Chinese Text Based on Two Variants of the Maximum Matching Algorithm. Available: http://technology.chtsai.org/mmseg/ 國語辭典簡編本編輯小組. (1997). 國語辭典簡編本編輯資料字詞頻統計報告. Available: http://www.edu.tw/files/site_content/M0001/pin/f11.html 中華郵政有限公司. (2012, 06.10). 3+2碼郵遞區號Excel檔 101/05. Available: http://www.post.gov.tw/post/internet/down/index.html F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and É. Duchesnay, "Scikit-learn: Machine Learning in Python," JMLR, vol. 12, pp. 2825-2830, 2011.	zh_TW

Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

Google Scholar^TM