學術產出-Conference Papers

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

  • No doi shows Citation Infomation
題名 兩階層式垃圾郵件過濾機制之研究
A Study of Two-tier Filtering Schemes forAnti-spam
作者 葉生正
蘇民揚
張僩鈞
關鍵詞 支援向量機 ; 貝氏演算法 ; 資訊增益
SVM ; Naive Bayes ; Information Gain
日期 2006
上傳時間 18-Dec-2017 17:38:26 (UTC+8)
摘要 垃圾郵件氾濫於今日,造就各種防堵機制群雄並起,而在內容過濾比對法中又以機械學習理論的支援向量機(Support Vector Machine, SVM)與貝氏演算法(Naïve Bayes)最為出色。故本研究論文主要擷取SVM以超平面快速分類的特點及貝氏演算法的彈性,設計規劃一套兩階層式之垃圾郵件過濾機制。本研究的實驗樣本採用中、英文郵件訓練樣本各1000封,以及測試樣本各200封,於中文斷詞、英文斷字後,再以Information Gain計算結果決定SVM訓練之關鍵字。最後將SVM對測試樣本之分類結果,以本論文定義的四種邊界距離挑選出落於模糊區間的郵件樣本,經由本研究提出之貝氏機率改良模型進行計分以判斷郵件類別。研究結果呈現四種邊界距離擷取出資料再計算後的準確率皆有所提升,其中又以最大距離(Maximum Distance)或平均距離(Average Distance)的改善最顯著;若加上在最佳化模式的預測下,中、英文樣本整體分類的精確度(Accuracy)皆達97%以上,因此可驗證本研究提出之兩階層式過濾機制與貝氏演算法改良模型的可行性與貢獻度。
The Support Vector Machine (SVM) and Naive Bayes are well-known machine-learning algorithms for the application of content filtering against spam. On the basis of fast classification through the hyper-plane of SVM and flexible threshold setting of Bayes, this paper proposes a two-tier filtering scheme which combine SVM and new Naive Bayes model for anti-spam. In the first tier, Information Gain is the way to decide keywords for training vector of SVM. The paper also provides four kinds of margin of the hyper-plane, and picks out the sampling data which locates on the scope for the second tier Bayesian probability calculation to decide the classification. The experimental results indicate that all kinds of the margin setting bring the improved accuracy about 1% to 4%, especially the Maximum Distance and Average Distance Margin. Additionally, the optimal model performs the total accuracy of Chinese and English sampling mails above 97%. However, the proposed two-tier filtering scheme and new Naive Bayes model were verified with availability.
關聯 TANET 2006 台灣網際網路研討會論文集
資通安全、不當資訊防治
資料類型 conference
dc.creator (作者) 葉生正zh_TW
dc.creator (作者) 蘇民揚zh_TW
dc.creator (作者) 張僩鈞zh_TW
dc.date (日期) 2006
dc.date.accessioned 18-Dec-2017 17:38:26 (UTC+8)-
dc.date.available 18-Dec-2017 17:38:26 (UTC+8)-
dc.date.issued (上傳時間) 18-Dec-2017 17:38:26 (UTC+8)-
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/115202-
dc.description.abstract (摘要) 垃圾郵件氾濫於今日,造就各種防堵機制群雄並起,而在內容過濾比對法中又以機械學習理論的支援向量機(Support Vector Machine, SVM)與貝氏演算法(Naïve Bayes)最為出色。故本研究論文主要擷取SVM以超平面快速分類的特點及貝氏演算法的彈性,設計規劃一套兩階層式之垃圾郵件過濾機制。本研究的實驗樣本採用中、英文郵件訓練樣本各1000封,以及測試樣本各200封,於中文斷詞、英文斷字後,再以Information Gain計算結果決定SVM訓練之關鍵字。最後將SVM對測試樣本之分類結果,以本論文定義的四種邊界距離挑選出落於模糊區間的郵件樣本,經由本研究提出之貝氏機率改良模型進行計分以判斷郵件類別。研究結果呈現四種邊界距離擷取出資料再計算後的準確率皆有所提升,其中又以最大距離(Maximum Distance)或平均距離(Average Distance)的改善最顯著;若加上在最佳化模式的預測下,中、英文樣本整體分類的精確度(Accuracy)皆達97%以上,因此可驗證本研究提出之兩階層式過濾機制與貝氏演算法改良模型的可行性與貢獻度。
dc.description.abstract (摘要) The Support Vector Machine (SVM) and Naive Bayes are well-known machine-learning algorithms for the application of content filtering against spam. On the basis of fast classification through the hyper-plane of SVM and flexible threshold setting of Bayes, this paper proposes a two-tier filtering scheme which combine SVM and new Naive Bayes model for anti-spam. In the first tier, Information Gain is the way to decide keywords for training vector of SVM. The paper also provides four kinds of margin of the hyper-plane, and picks out the sampling data which locates on the scope for the second tier Bayesian probability calculation to decide the classification. The experimental results indicate that all kinds of the margin setting bring the improved accuracy about 1% to 4%, especially the Maximum Distance and Average Distance Margin. Additionally, the optimal model performs the total accuracy of Chinese and English sampling mails above 97%. However, the proposed two-tier filtering scheme and new Naive Bayes model were verified with availability.
dc.format.extent 372586 bytes-
dc.format.mimetype application/pdf-
dc.relation (關聯) TANET 2006 台灣網際網路研討會論文集zh_TW
dc.relation (關聯) 資通安全、不當資訊防治zh_TW
dc.subject (關鍵詞) 支援向量機 ; 貝氏演算法 ; 資訊增益zh_TW
dc.subject (關鍵詞) SVM ; Naive Bayes ; Information Gainen_US
dc.title (題名) 兩階層式垃圾郵件過濾機制之研究zh_TW
dc.title (題名) A Study of Two-tier Filtering Schemes forAnti-spamen_US
dc.type (資料類型) conference