Support Vector Machines Parameter Selection Based on Combined Taguchi Method and Staelin Method for E-mail Spam Filtering
Keywords:
Support Vector Machines, Taguchi Method, Grid SearchAbstract
Support vector machines (SVM) are a powerful tool for building good spam filtering models. However, the performance of the model depends on parameter selection. Parameter selection of SVM will affect classification performance seriously during training process. In this study, we use combined Taguchi method and Staelin method to optimize the SVM-based E-mail Spam Filtering model and promote spam filtering accuracy. We compare it with other parameters optimization methods, such as grid search. Six real-world mail data sets are selected to demonstrate the effectiveness and feasibility of the method. The results show that our proposed methods can find the effective model with high classification accuracy.References
W. W. Cohen, "Fast effective rule induction," in Proceedings of the Twelfth International Conference on Machine Learning, 1995, pp. 115-123.
W. W. Cohen, "Learning rules that classify e-mail," in Proceedings of the 1996 AAAI Spring Symposium in Information Access, 1996, pp. 18-25.
I. Androutsopoulos, G. Paliouras, and E. Michelakis, "Learning to filter unsolicited commercial e-mail," "DEMOKRITOS", National Center for Scientific Research Technical report 2004/2, 2004.
M. Collins, R. E. Schapire, Y. Singer, P. Domingos, W. Fan, S. J. Stolfo, J. Zhang, P. K. Chan, Y. Freund, and R. Schapire, "Boosting Trees for Anti-Spam Email Filtering," 4th International Conference on Recent Advances in Natural Language Processing, 2001, pp. 1189-1232.
I. Androutsopoulos, G. Paliouras, V. Karkaletsis, G. Sakkis, C. D. Spyropoulos, and P. Stamatopoulos, "Learning to filter spam e-mail: A comparison of a naive bayesian and a memory-based approach," presented at the Proceedings of the workshop "Machine Learning and Textual Information Access", 4th European Conference on Principles and Practice of Knowledge Discovery in Databases, 2000.
V. N. Vapnik, The nature of statistical learning theory. New York: Springer Verlag, 2000.
J. Provost, "Naive-bayes vs. rule-learning in classification of email. The University of Texas at Austin," Artificial Intelligence Lab. Technical Report AI-TR-99-284, 1999.
C. L. Huang and C. J. Wang, "A GA-based feature selection and parameters optimizationfor support vector machines," Expert Systems With Applications, vol. 31, pp. 231-240, 2006.
T. Howley and M. G. Madden, "The genetic kernel support vector machine: Description and evaluation," Artificial Intelligence Review, vol. 24, pp. 379-395, 2005.
G. Taguchi and S. Chowdhury, Robust engineering, New Work: McGraw-Hill, 2000.
C. C. Chang and C. J. Lin. (2008). LIBSVM -- A Library for Support Vector Machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm/
G. Taguchi, Introduction to quality engineering, Tokyo: Asian Productivity Organization, 1990.
M. Phadke, Quality engineering using robust design, U.S.A: Prentice Hall PTR Upper Saddle River, 1995.
D. C. Montgomery, Design and analysis of experiments, New York: Wiley, 2006.
C. Staelin, "Parameter selection for support vector machines," Hewlett-Packard Company, Tech. Rep. HPL-2002-354R1, 2003.
N. Logothetis and H. P. Wynn, Quality through design: experimental design, off-line quality control, and Taguchi's contributions, Oxford: Clarendon Press, 1989.
Published
How to Cite
Issue
Section
License
Copyright Notice
Submission of a manuscript implies: that the work described has not been published before that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication. Authors can retain copyright in their articles with no restrictions. Also, author can post the final, peer-reviewed manuscript version (postprint) to any repository or website.
Since Jan. 01, 2019, IJETI will publish new articles with Creative Commons Attribution Non-Commercial License, under Creative Commons Attribution Non-Commercial 4.0 International (CC BY-NC 4.0) License.
The Creative Commons Attribution Non-Commercial (CC-BY-NC) License permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.