Publications-Theses

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 藉由氣象資料應用羅吉斯迴歸及決策樹模型來預測颱風及水災期間成災與否
Predict Disaster during Typhoons and Floods with Meteorological data by using Logistic Regression and Decision Tree models
作者 王崇飛
Wang, Chung-Fei
貢獻者 張家銘
Chang, Jia-Ming
王崇飛
Wang, Chung-Fei
關鍵詞 災害
氣象
羅吉斯迴歸
決策樹
風速
雨量
颱風
水災
Disaster
Weather
Logistic Regression
Decision Tree
Rain
Windspeed
Typhoon
Flood
日期 2022
上傳時間 5-Oct-2022 09:09:06 (UTC+8)
摘要 臺灣地理位置特殊,每年皆會面對颱風、洪水等天然災害的威脅,雖然無法避免災害的發生,卻能運用各類科技技術,來減少災害來臨時造成的威脅與損傷。
近幾年因科技運算能力的強化,讓大數據、人工智慧與機器學習成為近幾年的熱門關鍵詞,而在災害防救上鮮少有運用數據分析技術針對災情與氣象進行研究,故本文將氣象資料與災情資訊應用羅吉斯迴歸及決策樹建立模型。
本研究透過蒐集歷史氣象資料、災情資訊及氣象測站資料,將前述資料內容進行格式欄位統一、去除無關資料等資料清洗,再將其依據彼此關聯性進行測站內資料整合,以作為後續分析統計及建模之基準資料。
模型則以測站內的氣象資料作為自變數,災情資料作為依變數,透過不同採樣方式進行訓練及測試資料集拆分,建構該氣象測站的模型,並將測試資料集透過模型產出預測的數值,使用混淆矩陣來比較不同條件下的精準度、精準率、召回率及F1-Score。
分析結果得出平均準確率最高為99.7%,平均精準率最高為67.9%,平均召回率最高為81.9%,平均F1-Score最高為48.6%;若單獨以測站來看F1-Score最高為C0M730(嘉義市東區)測站的96.6%,且除C0M730(嘉義市東區)測站外,有60個測站在本文所建立的224個模型的表現(F1-Score>80%),達到預期的結果,其餘未達標的部分將於未來透過其他模型演算法或採樣方式進一步的精進。
科技雖然無法改變氣候,卻可以改變面對氣候時的準備與應變,用最好的準備,來面對最壞的打算。
Due to its special geographical location, Taiwan faces the threat of natural disasters such as typhoons and floods every year. Although the occurrence of disasters cannot be avoided, various type of technology can be used to reduce the threats and damages caused by disasters.
In recent years, due to the strengthening of scientific and technological computing capabilities, big data, artificial intelligence, and machine learning have become popular keywords. However, data analysis technology is rarely used in disaster prevention. Therefore, this paper uses the meteorological data and disaster information by Logistic Regression and Decision Tree to build models.
This research will first collect meteorological data, disaster information and observation station data, and clean those data by unifying the format and deleting irrelevant data. Then integrate those data based on their correlation in each meteorological observation station to serve as the benchmark data for subsequent analysis, statistics, and modeling.
The model using the meteorological data as the independent variable and the disaster data as the dependent variable, and then splits the training and testing data sets through different sampling methods. Build the model of the meteorological station and use the test data set to output the predicted value through the model, use the confusion matrix to compare the accuracy, precision, recall rate and F1-Score under different conditions.
The analysis results show that the highest average accuracy rate is 99.7%, the highest average precision rate is 67.9%, the highest average recall rate is 81.9%, and the highest average F1-Score is 48.6%. If look at the observation station alone, the highest F1-Score is C0M730 (East District of Chiayi City) 96.6%. In addition to the C0M730 station, there are 60 stations of the 224 models building in this paper, in the performance reaching the expected(F1-Score> 80%). As a result, the remaining parts that do not reach the standard will be further refined through other model algorithm or sampling methods in the future.
Although technology cannot change the climate, it can change the preparation and response to the climate. Use the best preparation to face the worst situation.
參考文獻 [1] 內政部消防署-全民防災E點通-歷年災害專區,取自:https://bear.emic.gov.tw/MY/#/home/disasterInfo/history
[2] 國家災害防救科技中心-全球災害事件簿-颱風事件,取自:https://den.ncdr.nat.gov.tw/1132/1188/
[3] 民生公共物聯網-資料服務平台,取自https://ci.taiwan.gov.tw/dsp/index.aspx
[4] 中央氣象局-測站代號及站況資料查詢,取自:https://e-service.cwb.gov.tw/wdps/obs/state.htm
[5] 內政部TGOS全國門牌地址定位服務,取自:https://www.tgos.tw/tgos/Web/Address/TGOS_Address.aspx
[6] Python-pandas, From: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html
[7] scikit-learn- LogisticRegression, From:https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
[8] scikit-learn-Decision Tree, From:https://scikit-learn.org/stable/modules/Tree.html
[9] imbalanced-learn- SMOT, From:https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.SMOTE.html
[10] imbalanced-learn-TomekLink, From:https://imbalanced-learn.org/stable/references/generated/imblearn.under_sampling.TomekLinks.html
[11] imbalanced-learn- Combination of over- and under-sampling methods, From:https://imbalanced-learn.org/stable/references/combine.html
[12] matplotlib- 3D scatterplot, From:https://matplotlib.org/stable/gallery/mplot3d/scatter3d.html
描述 碩士
國立政治大學
資訊科學系碩士在職專班
106971017
資料來源 http://thesis.lib.nccu.edu.tw/record/#G0106971017
資料類型 thesis
dc.contributor.advisor 張家銘zh_TW
dc.contributor.advisor Chang, Jia-Mingen_US
dc.contributor.author (Authors) 王崇飛zh_TW
dc.contributor.author (Authors) Wang, Chung-Feien_US
dc.creator (作者) 王崇飛zh_TW
dc.creator (作者) Wang, Chung-Feien_US
dc.date (日期) 2022en_US
dc.date.accessioned 5-Oct-2022 09:09:06 (UTC+8)-
dc.date.available 5-Oct-2022 09:09:06 (UTC+8)-
dc.date.issued (上傳時間) 5-Oct-2022 09:09:06 (UTC+8)-
dc.identifier (Other Identifiers) G0106971017en_US
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/142100-
dc.description (描述) 碩士zh_TW
dc.description (描述) 國立政治大學zh_TW
dc.description (描述) 資訊科學系碩士在職專班zh_TW
dc.description (描述) 106971017zh_TW
dc.description.abstract (摘要) 臺灣地理位置特殊,每年皆會面對颱風、洪水等天然災害的威脅,雖然無法避免災害的發生,卻能運用各類科技技術,來減少災害來臨時造成的威脅與損傷。
近幾年因科技運算能力的強化,讓大數據、人工智慧與機器學習成為近幾年的熱門關鍵詞,而在災害防救上鮮少有運用數據分析技術針對災情與氣象進行研究,故本文將氣象資料與災情資訊應用羅吉斯迴歸及決策樹建立模型。
本研究透過蒐集歷史氣象資料、災情資訊及氣象測站資料,將前述資料內容進行格式欄位統一、去除無關資料等資料清洗,再將其依據彼此關聯性進行測站內資料整合,以作為後續分析統計及建模之基準資料。
模型則以測站內的氣象資料作為自變數,災情資料作為依變數,透過不同採樣方式進行訓練及測試資料集拆分,建構該氣象測站的模型,並將測試資料集透過模型產出預測的數值,使用混淆矩陣來比較不同條件下的精準度、精準率、召回率及F1-Score。
分析結果得出平均準確率最高為99.7%,平均精準率最高為67.9%,平均召回率最高為81.9%,平均F1-Score最高為48.6%;若單獨以測站來看F1-Score最高為C0M730(嘉義市東區)測站的96.6%,且除C0M730(嘉義市東區)測站外,有60個測站在本文所建立的224個模型的表現(F1-Score>80%),達到預期的結果,其餘未達標的部分將於未來透過其他模型演算法或採樣方式進一步的精進。
科技雖然無法改變氣候,卻可以改變面對氣候時的準備與應變,用最好的準備,來面對最壞的打算。
zh_TW
dc.description.abstract (摘要) Due to its special geographical location, Taiwan faces the threat of natural disasters such as typhoons and floods every year. Although the occurrence of disasters cannot be avoided, various type of technology can be used to reduce the threats and damages caused by disasters.
In recent years, due to the strengthening of scientific and technological computing capabilities, big data, artificial intelligence, and machine learning have become popular keywords. However, data analysis technology is rarely used in disaster prevention. Therefore, this paper uses the meteorological data and disaster information by Logistic Regression and Decision Tree to build models.
This research will first collect meteorological data, disaster information and observation station data, and clean those data by unifying the format and deleting irrelevant data. Then integrate those data based on their correlation in each meteorological observation station to serve as the benchmark data for subsequent analysis, statistics, and modeling.
The model using the meteorological data as the independent variable and the disaster data as the dependent variable, and then splits the training and testing data sets through different sampling methods. Build the model of the meteorological station and use the test data set to output the predicted value through the model, use the confusion matrix to compare the accuracy, precision, recall rate and F1-Score under different conditions.
The analysis results show that the highest average accuracy rate is 99.7%, the highest average precision rate is 67.9%, the highest average recall rate is 81.9%, and the highest average F1-Score is 48.6%. If look at the observation station alone, the highest F1-Score is C0M730 (East District of Chiayi City) 96.6%. In addition to the C0M730 station, there are 60 stations of the 224 models building in this paper, in the performance reaching the expected(F1-Score> 80%). As a result, the remaining parts that do not reach the standard will be further refined through other model algorithm or sampling methods in the future.
Although technology cannot change the climate, it can change the preparation and response to the climate. Use the best preparation to face the worst situation.
en_US
dc.description.tableofcontents 致謝 I
摘要 II
Abstract III
目錄 IV
表目錄 VI
圖目錄 IX
第一章 緒論 1
第一節 研究動機 1
第二節 論文架構 1
第二章 資料前處理 3
第一節 資料蒐集 3
一、 災情資料 5
二、 氣象資料 7
三、 測站資料 13
第二節 資料清洗與整理 14
一、 資料清洗 14
二、 資料處理 18
第三章 資料彙整與統計 20
第一節 合併氣象資料與災情 20
第二節 建立整體氣象與災情詳細資料 22
第三節 建立單一測站單位內詳細資料(氣象與災情) 24
第四節 建立以災情類別為單位之測站單位內詳細資料(氣象與災情) 24
第五節 展開災情發生次數 25
第六節 小結 26
第四章 分析研究 27
第一節 分析內容說明 27
一、 分析流程 28
二、 演算法說明 29
三、 採樣方式說明 30
四、 混淆矩陣說明 31
第二節 以測站分析 32
一、 資料筆數 32
二、 羅吉斯迴歸Logistic Regression 32
三、 決策樹Decision Tree 38
第三節 以災情類別分析 45
一、 資料筆數 45
二、 羅吉斯迴歸Logistic Regression 48
三、 決策樹 Decision Tree 72
第四節 整體分析結果比較 89
一、 準確率 89
二、 精準率 89
三、 召回率比較 90
四、 F1-Score 91
五、 以測站為單位比較 92
第五節 總結 100
第五章 結論與建議 101
參考文獻 102
zh_TW
dc.format.extent 5823465 bytes-
dc.format.mimetype application/pdf-
dc.source.uri (資料來源) http://thesis.lib.nccu.edu.tw/record/#G0106971017en_US
dc.subject (關鍵詞) 災害zh_TW
dc.subject (關鍵詞) 氣象zh_TW
dc.subject (關鍵詞) 羅吉斯迴歸zh_TW
dc.subject (關鍵詞) 決策樹zh_TW
dc.subject (關鍵詞) 風速zh_TW
dc.subject (關鍵詞) 雨量zh_TW
dc.subject (關鍵詞) 颱風zh_TW
dc.subject (關鍵詞) 水災zh_TW
dc.subject (關鍵詞) Disasteren_US
dc.subject (關鍵詞) Weatheren_US
dc.subject (關鍵詞) Logistic Regressionen_US
dc.subject (關鍵詞) Decision Treeen_US
dc.subject (關鍵詞) Rainen_US
dc.subject (關鍵詞) Windspeeden_US
dc.subject (關鍵詞) Typhoonen_US
dc.subject (關鍵詞) Flooden_US
dc.title (題名) 藉由氣象資料應用羅吉斯迴歸及決策樹模型來預測颱風及水災期間成災與否zh_TW
dc.title (題名) Predict Disaster during Typhoons and Floods with Meteorological data by using Logistic Regression and Decision Tree modelsen_US
dc.type (資料類型) thesisen_US
dc.relation.reference (參考文獻) [1] 內政部消防署-全民防災E點通-歷年災害專區,取自:https://bear.emic.gov.tw/MY/#/home/disasterInfo/history
[2] 國家災害防救科技中心-全球災害事件簿-颱風事件,取自:https://den.ncdr.nat.gov.tw/1132/1188/
[3] 民生公共物聯網-資料服務平台,取自https://ci.taiwan.gov.tw/dsp/index.aspx
[4] 中央氣象局-測站代號及站況資料查詢,取自:https://e-service.cwb.gov.tw/wdps/obs/state.htm
[5] 內政部TGOS全國門牌地址定位服務,取自:https://www.tgos.tw/tgos/Web/Address/TGOS_Address.aspx
[6] Python-pandas, From: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html
[7] scikit-learn- LogisticRegression, From:https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
[8] scikit-learn-Decision Tree, From:https://scikit-learn.org/stable/modules/Tree.html
[9] imbalanced-learn- SMOT, From:https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.SMOTE.html
[10] imbalanced-learn-TomekLink, From:https://imbalanced-learn.org/stable/references/generated/imblearn.under_sampling.TomekLinks.html
[11] imbalanced-learn- Combination of over- and under-sampling methods, From:https://imbalanced-learn.org/stable/references/combine.html
[12] matplotlib- 3D scatterplot, From:https://matplotlib.org/stable/gallery/mplot3d/scatter3d.html
zh_TW
dc.identifier.doi (DOI) 10.6814/NCCU202201508en_US