兩階層式垃圾郵件過濾機制之研究 | NCCU Academic Hub

學術產出-Conference Papers

Article View/Open

pdf(297)

Publication Export

Google Scholar^TM

政大圖書館

學術資源探索系統

Citation Infomation

No doi shows Citation Infomation

Simple Record
Full Record

題名	兩階層式垃圾郵件過濾機制之研究 A Study of Two-tier Filtering Schemes forAnti-spam
作者	葉生正蘇民揚張僩鈞
關鍵詞	支援向量機 ; 貝氏演算法 ; 資訊增益 SVM ; Naive Bayes ; Information Gain
日期	2006
上傳時間	18-Dec-2017 17:38:26 (UTC+8)
摘要	垃圾郵件氾濫於今日，造就各種防堵機制群雄並起，而在內容過濾比對法中又以機械學習理論的支援向量機(Support Vector Machine, SVM)與貝氏演算法(Naïve Bayes)最為出色。故本研究論文主要擷取SVM以超平面快速分類的特點及貝氏演算法的彈性，設計規劃一套兩階層式之垃圾郵件過濾機制。本研究的實驗樣本採用中、英文郵件訓練樣本各1000封，以及測試樣本各200封，於中文斷詞、英文斷字後，再以Information Gain計算結果決定SVM訓練之關鍵字。最後將SVM對測試樣本之分類結果，以本論文定義的四種邊界距離挑選出落於模糊區間的郵件樣本，經由本研究提出之貝氏機率改良模型進行計分以判斷郵件類別。研究結果呈現四種邊界距離擷取出資料再計算後的準確率皆有所提升，其中又以最大距離(Maximum Distance)或平均距離(Average Distance)的改善最顯著；若加上在最佳化模式的預測下，中、英文樣本整體分類的精確度(Accuracy)皆達97%以上，因此可驗證本研究提出之兩階層式過濾機制與貝氏演算法改良模型的可行性與貢獻度。 The Support Vector Machine (SVM) and Naive Bayes are well-known machine-learning algorithms for the application of content filtering against spam. On the basis of fast classification through the hyper-plane of SVM and flexible threshold setting of Bayes, this paper proposes a two-tier filtering scheme which combine SVM and new Naive Bayes model for anti-spam. In the first tier, Information Gain is the way to decide keywords for training vector of SVM. The paper also provides four kinds of margin of the hyper-plane, and picks out the sampling data which locates on the scope for the second tier Bayesian probability calculation to decide the classification. The experimental results indicate that all kinds of the margin setting bring the improved accuracy about 1% to 4%, especially the Maximum Distance and Average Distance Margin. Additionally, the optimal model performs the total accuracy of Chinese and English sampling mails above 97%. However, the proposed two-tier filtering scheme and new Naive Bayes model were verified with availability.
關聯	TANET 2006 台灣網際網路研討會論文集資通安全、不當資訊防治
資料類型	conference

dc.creator (作者)	葉生正	zh_TW
dc.creator (作者)	蘇民揚	zh_TW
dc.creator (作者)	張僩鈞	zh_TW
dc.date (日期)	2006
dc.date.accessioned	18-Dec-2017 17:38:26 (UTC+8)	-
dc.date.available	18-Dec-2017 17:38:26 (UTC+8)	-
dc.date.issued (上傳時間)	18-Dec-2017 17:38:26 (UTC+8)	-
dc.identifier.uri (URI)	http://nccur.lib.nccu.edu.tw/handle/140.119/115202	-
dc.description.abstract (摘要)	垃圾郵件氾濫於今日，造就各種防堵機制群雄並起，而在內容過濾比對法中又以機械學習理論的支援向量機(Support Vector Machine, SVM)與貝氏演算法(Naïve Bayes)最為出色。故本研究論文主要擷取SVM以超平面快速分類的特點及貝氏演算法的彈性，設計規劃一套兩階層式之垃圾郵件過濾機制。本研究的實驗樣本採用中、英文郵件訓練樣本各1000封，以及測試樣本各200封，於中文斷詞、英文斷字後，再以Information Gain計算結果決定SVM訓練之關鍵字。最後將SVM對測試樣本之分類結果，以本論文定義的四種邊界距離挑選出落於模糊區間的郵件樣本，經由本研究提出之貝氏機率改良模型進行計分以判斷郵件類別。研究結果呈現四種邊界距離擷取出資料再計算後的準確率皆有所提升，其中又以最大距離(Maximum Distance)或平均距離(Average Distance)的改善最顯著；若加上在最佳化模式的預測下，中、英文樣本整體分類的精確度(Accuracy)皆達97%以上，因此可驗證本研究提出之兩階層式過濾機制與貝氏演算法改良模型的可行性與貢獻度。
dc.description.abstract (摘要)	The Support Vector Machine (SVM) and Naive Bayes are well-known machine-learning algorithms for the application of content filtering against spam. On the basis of fast classification through the hyper-plane of SVM and flexible threshold setting of Bayes, this paper proposes a two-tier filtering scheme which combine SVM and new Naive Bayes model for anti-spam. In the first tier, Information Gain is the way to decide keywords for training vector of SVM. The paper also provides four kinds of margin of the hyper-plane, and picks out the sampling data which locates on the scope for the second tier Bayesian probability calculation to decide the classification. The experimental results indicate that all kinds of the margin setting bring the improved accuracy about 1% to 4%, especially the Maximum Distance and Average Distance Margin. Additionally, the optimal model performs the total accuracy of Chinese and English sampling mails above 97%. However, the proposed two-tier filtering scheme and new Naive Bayes model were verified with availability.
dc.format.extent	372586 bytes	-
dc.format.mimetype	application/pdf	-
dc.relation (關聯)	TANET 2006 台灣網際網路研討會論文集	zh_TW
dc.relation (關聯)	資通安全、不當資訊防治	zh_TW
dc.subject (關鍵詞)	支援向量機 ; 貝氏演算法 ; 資訊增益	zh_TW
dc.subject (關鍵詞)	SVM ; Naive Bayes ; Information Gain	en_US
dc.title (題名)	兩階層式垃圾郵件過濾機制之研究	zh_TW
dc.title (題名)	A Study of Two-tier Filtering Schemes forAnti-spam	en_US
dc.type (資料類型)	conference