Please use this identifier to cite or link to this item: https://ah.lib.nccu.edu.tw/handle/140.119/115202
題名: 兩階層式垃圾郵件過濾機制之研究
A Study of Two-tier Filtering Schemes forAnti-spam
作者: 葉生正
蘇民揚
張僩鈞
關鍵詞: 支援向量機 ; 貝氏演算法 ; 資訊增益
SVM ; Naive Bayes ; Information Gain
日期: 2006
上傳時間: 18-Dec-2017
摘要: 垃圾郵件氾濫於今日,造就各種防堵機制群雄並起,而在內容過濾比對法中又以機械學習理論的支援向量機(Support Vector Machine, SVM)與貝氏演算法(Naïve Bayes)最為出色。故本研究論文主要擷取SVM以超平面快速分類的特點及貝氏演算法的彈性,設計規劃一套兩階層式之垃圾郵件過濾機制。本研究的實驗樣本採用中、英文郵件訓練樣本各1000封,以及測試樣本各200封,於中文斷詞、英文斷字後,再以Information Gain計算結果決定SVM訓練之關鍵字。最後將SVM對測試樣本之分類結果,以本論文定義的四種邊界距離挑選出落於模糊區間的郵件樣本,經由本研究提出之貝氏機率改良模型進行計分以判斷郵件類別。研究結果呈現四種邊界距離擷取出資料再計算後的準確率皆有所提升,其中又以最大距離(Maximum Distance)或平均距離(Average Distance)的改善最顯著;若加上在最佳化模式的預測下,中、英文樣本整體分類的精確度(Accuracy)皆達97%以上,因此可驗證本研究提出之兩階層式過濾機制與貝氏演算法改良模型的可行性與貢獻度。
The Support Vector Machine (SVM) and Naive Bayes are well-known machine-learning algorithms for the application of content filtering against spam. On the basis of fast classification through the hyper-plane of SVM and flexible threshold setting of Bayes, this paper proposes a two-tier filtering scheme which combine SVM and new Naive Bayes model for anti-spam. In the first tier, Information Gain is the way to decide keywords for training vector of SVM. The paper also provides four kinds of margin of the hyper-plane, and picks out the sampling data which locates on the scope for the second tier Bayesian probability calculation to decide the classification. The experimental results indicate that all kinds of the margin setting bring the improved accuracy about 1% to 4%, especially the Maximum Distance and Average Distance Margin. Additionally, the optimal model performs the total accuracy of Chinese and English sampling mails above 97%. However, the proposed two-tier filtering scheme and new Naive Bayes model were verified with availability.
關聯: TANET 2006 台灣網際網路研討會論文集
資通安全、不當資訊防治
資料類型: conference
Appears in Collections:會議論文

Files in This Item:
File Description SizeFormat
659.pdf363.85 kBAdobe PDF2View/Open
Show full item record

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.