Spam E-mail Classification Based on the IFWB Algorithm | |
---|---|
學年 | 101 |
學期 | 2 |
出版(發表)日期 | 2013-03-19 |
作品名稱 | Spam E-mail Classification Based on the IFWB Algorithm |
作品名稱(其他語言) | |
著者 | Jou, Chi-Chang |
單位 | |
出版者 | |
著錄名稱、卷期、頁數 | Lecture Notes in Artificial Intelligence 7802, pp.314-324 |
摘要 | The problem of spam overflow has not been solved completely. Many anti-spam techniques have been proposed. Among them, the machine learning techniques are the most popular, but these works are based on a static environment assumption. In the real world application, the email context may change with concept drift. The classification result is usually good at the beginning, but along with time evolution and concept drift, the classification accuracy dropped down gradually. So a mechanism is needed to adjust the classifier according to the new incoming emails and the old emails in the dataset. Another problem of email categorization is data skewedness. Because of the spam overflow, the number of spam emails is far more than that of legitimate ones. In the classification result, the majority class is with higher recall rate, but the minority class with poor recall rate. For these reasons, we propose an algorithm, IFWB (Incremental Forgetting Weighted Bayesian), based on Naïve Bayesian and IGICF (Information Gain and Inverse Class Frequency) feature extraction, combined with gradual forgetting mechanism and cost-sensitive model to tackle concept drift and data skewedness. Finally, we demonstrate the effectiveness of the IFWB algorithm through a series of experiments. |
關鍵字 | spam classification;incremental forgetting;misclassification cost |
語言 | en |
ISSN | 0302-9743 |
期刊性質 | 國外 |
收錄於 | EI |
產學合作 | |
通訊作者 | Jou, Chi-Chang |
審稿制度 | 否 |
國別 | DEU |
公開徵稿 | |
出版型式 | ,紙本 |
相關連結 |
機構典藏連結 ( http://tkuir.lib.tku.edu.tw:8080/dspace/handle/987654321/108550 ) |