| Peer-Reviewed

SMS Spam Filtering Using Machine Learning Techniques: A Survey

Received: 28 September 2016     Accepted: 5 November 2016     Published: 5 December 2016
Views:       Downloads:
Abstract

Objective: To report a review of various machine learning and hybrid algorithms for detecting SMS spam messages and comparing them according to accuracy criterion. Data sources: Original articles written in English found in Sciencedirect.com, Google-scholar.com, Search.com, IEEE explorer, and the ACM library. Study selection: Those articles dealing with machine learning and hybrid approaches for SMS spam filtering. Data extraction: Many articles extracted by searching a predefined string and the outcome was reviewed by one author and checked by the second. The primary paper was reviewed and edited by the third author. Results: A total of 44 articles were selected which were concerned machine learning and hybrid methods for detecting SMS spam messages. 28 methods and algorithms were extracted from these papers and studied and finally 15 algorithms among them have been compared in one table according to their accuracy, strengths, and weaknesses in detecting spam messages of the Tiago dataset of spam message. Actually, among the proposed methods DCA algorithm, the large cellular network method and graph-based KNN are three most accurate in filtering SMS spams of Tiago data set. Moreover, Hybrid methods are discussed in this paper.

Published in Machine Learning Research (Volume 1, Issue 1)
DOI 10.11648/j.mlr.20160101.11
Page(s) 1-14
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2016. Published by Science Publishing Group

Previous article
Keywords

Spam Filtering, Machine Learning Algorithms, SMS Spam

References
[1] Huang W. L., Liu Y. and Zhong Z. Q., “Complex network based SMS filtering algorithm”, pp. 990–996, 2009.
[2] Wang C., Zhang Y. and Chen X., “A behavior-based SMS anti-spam system”, IBM Journal of Research and Development, pp. 1-16, 2010.
[3] Xiang Y., Chowdhury M. and Ali S., “Filtering mobile spam by support vector machine”, In Proceedings of the Third International Conference on Computer Sciences, Software Engineering, Information Technology, pp. 1–4, 2004.
[4] Healy M., Delany S. and Zamolotskikh A., “An assessment of case-based reasoning for short text message classification”, In Proceedings of 16th Irish Conference on Artificial Intelligence and Cognitive Science, pp. 257–266, 2005.
[5] Duan L. Z., Li A. and Huang L. J., “A New Spam Short Message Classification”, In Proceeding of the 1st International Workshop on Education Technology and Computer Science, pp. 168–171, 2009.
[6] Zheng X., Liu C. and Zou Y., “Chinese Short Messages service spam filtering based on logistic regression”, Journal of Heilongjiang Institute of Technology, pp. 36–39, 2010.
[7] Cai J., Tang Y. Z. and Hu R. L., “Spam filter for short messages using Winnow”, In Proceeding of the 7th International Conference on Advanced Language Processing and Web Information Technology, pp. 454–459, 2008.
[8] Gómez J. M., Bringas G. C., Sánz E. P. and García F. C., “Content based SMS spam filtering”, In Proceedings of the ACM Symposium on Document Engineering, pp. 107–114, 2006.
[9] Zhang J., Li X. M. and Xu W., “Filtering algorithm of spam short messages based on artificial immune system”, In Proceeding of International Conference on Electrical and Control Engineering, pp. 195–198, 2011.
[10] Junaid M. B. and Farooq M., “Using evolutionary learning classifiers to do mobile spam (SMS) filtering”, In Proceedings of the 13th annual conference on Genetic and evolutionary computation, pp. 1795-1802, 2011.
[11] He P. Z., Sun Y. and Zheng W., “Filtering short message spam of group sending using CAPTCHA”, In Proceeding of the 1st International Workshop on Knowledge Discovery and Data Mining, pp. 558–561, 2008.
[12] Liang Ch., Zheng Y., Weidong Zh. and Kantola R., “Implementation of an SMS Spam Control System Based on Trust Management”, In Proceedings of IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing, pp. 887-894, 2013.
[13] Liumei Zh., Jianfeng M. and Yichuan W., “Content Based Spam Text Classification: An Empirical Comparison between English and Chinese”, In Proceeding of the 5th International Conference on Intelligent Networking and Collaborative Systems, pp. 69-76, 2013.
[14] Tran H., Ho-Seok K. and Sung-Ryul K., “Graph-based KNN Algorithm for Spam SMS Detection”, Journal of Universal Computer Science, 2013.
[15] Nan J., Yu J., Ann S. and Zhi-Li Zh, “Understanding SMS Spam in a Large Cellular Network: Characteristics, Strategies and Defenses”, In Proceeding of 16th International Symposium, pp. 328-347, 2013.
[16] Iosif A., Vasileios V. and Alexandros P., “FIMESS: FIltering Mobile External SMS Spam”, In Proceeding of the 6th Balkan Conference in Informatics, pp. 221-227, 2013.
[17] Abiodun M., Oludayo O. and Sunday O., “Filtering of Mobile Short Messaging Service Communication Using Latent Dirichlet Allocation with Social Network Analysis”, Transactions on Engineering Technologies, pp. 671-686, 2014.
[18] Amir K. and Lina Zh., “Improving Static SMS Spam Detection by Using New Content-based Features”, Twentieth Americas Conference on Information Systems, 2014.
[19] Ahmed I, Ali R., Guan D., Lee Y., Lee S. and Chung T., “Semi-supervised learning using frequent itemset and ensemble learning for SMS classification”, Expert Systems with Applications, vol.42, No. 3, pp. 1065-1073, 2015.
[20] Ala’ E. and Suku N., “Semi-Synthetic Data for Enhanced SMS Spam Detection”, In Proceedings of the 6th International Conference on Management of Emergent Digital EcoSystems, pp. 206-212. 2014.
[21] Abdullah J. and Ali Gh., “A Multi-agent System for Smartphone Intrusion Detection Framework”, In Proceeding. of the 18th Asia Pacific Symposium on Intelligent and Evolutionary Systems, pp. 101-113 2015.
[22] Hongli W. and YH. Jiang, “SMS Spam Filtering Based on “Cloud Security”, Applied Mechanics and Materials, pp. 2015-2019, 2015.
[23] Atefeh H., Mohammad ali T., Naomie S., Zahra H., “Detection of review spam: A survey”, Expert Systems with Applications, vol. 42, No. 7, pp. 3634–3642, 2015.
[24] Kuruvilla M. and Biju I., “Intelligent Spam Classification for Mobile Text Message”, In Proceeding of Computer Science and Network Technology, pp. 101-105, 2011.
[25] Ian H. Witten and Eibe F., “Data Mining, Practical Machine Learning Tools and Techniques”, Morgan Kaufmann Publishers, Third edition, 2005.
[26] Tiago A., José G. and Akebo Y., “Contributions to the Study of SMS Spam Filtering: New Collection and Results”, In Proceedings of the ACM Symposium on Document Engineering, 2011.
[27] Kuldeep Y., Ponnurangam K., Atul G., Ashish G. and Vinayak N., “SMS Assassin: Crowdsourcing Driven Mobile-based System for SMS Spam Filtering”, In Proceedings of the 12th Workshop on Mobile Computing Systems and Applications, pp. 1-6, 2011.
[28] Qian X., Evan Wei X. and Qiang Y., “SMS Spam Detection Using Non-Content Features”, In Proceeding of Intelligent Systems IEEE, pp. 44-51, 2012.
[29] Alper K., Serkan G., Semih E. and Efnan G., “The Impact of Feature Extraction and Selection on SMS Spam Filtering”, ELEKTRONIKA IR ELEKTROTECHNIKA, vol. 19, No. 5, 2013.
[30] Rish I., “An empirical study of the Naive Bayes classifier”, In Proceeding of IJCA Workshop on Empirical Methods in AI, 2001.
[31] Richard D., Peter E. and David S., “Pattern Classification”, Wiley-Intersceince Publisher, 2nd Edition, 2000.
[32] Stephen M., “Machine learning An Algorithm Perspective”, Chapman & Hall/CRC Publisher, 2009.
[33] Hong Z. and Wei W., “Application of Bayesian Method to Spam SMS Filtering”, In Proceedings of Information Engineering and Computer Science, pp. 1-3, 2009.
[34] Freund Y. and Schapire R. E., “Large margin classification using the perceptron algorithm”, In Proceedings of the 11th annual conference on Computational learning theory, pp. 209-217, 1999.
[35] Freund Y., Schapire E. and Hill M., “Experiments with a new boosting algorithm”, In Proceedings of the 13th International Conference on Machine Learning, San Francisco, 1996.
[36] Cleary G. E. and Trigg L. E., “K*: An Instance-based Learner Using an Entropic Distance Measure”, In Proceedings of the 12th International Conference on Machine Learning, pp. 108-114, 2008.
[37] Hassan Sh. and Mohammad Sh., “An Anti-SMS-Spam Using CAPTCHA”, In Proceedings of International Colloquium on Computing, Communication, Control, and Management, 2008.
[38] Kuldeep Y., Swetank K., Ponnurangam K. and Rohit K., “Take Control of Your SMSes: Designing a Usable Spam SMS Filtering System”, In Proceedings of the 13th International Conference on Mobile Data Management , 2012.
[39] Bilal J. and Muddassar F., “Using Evolutionary Learning Classifiers To Do Mobile Spam (SMS) Filtering”, In Proceeding of genetic and evolutionary computation conference, 2011.
[40] Ji Won Y., Hyoungshick K. and Jun H., “Hybrid spam filtering for mobile communication”, In Proceeding of Computers and Security, 2009.
[41] Taufiq N., Changmoo L. and Deokjai Ch., “Independent and Personal SMS Spam Filtering”, In Proceeding of Computer and Information Technology, 2011.
[42] Ali A. Al-Hasana, El-Sayed M. El-Alfy., “Dendritic Cell Algorithm for Mobile Phone Spam Filtering”, 6th International Conference on Ambient Systems, Networks and Technologies, ANT, 2015.
[43] Julie Greensmith1 and Uwe Aickelin, “The Deterministic Dendritic Cell Algorithm,” Artificial Immune Systems, pp. 291-302, 2008.
[44] Akbari, F., Sajedi, H., "SMS Spam Detection using Selected Text Features and Boosting Classifiers." In Proceedings of 7th Conference on Information and Knowledge Technology (IKT), April 5-6, 2015.
Cite This Article
  • APA Style

    Hedieh Sajedi, Golazin Zarghami Parast, Fatemeh Akbari. (2016). SMS Spam Filtering Using Machine Learning Techniques: A Survey. Machine Learning Research, 1(1), 1-14. https://doi.org/10.11648/j.mlr.20160101.11

    Copy | Download

    ACS Style

    Hedieh Sajedi; Golazin Zarghami Parast; Fatemeh Akbari. SMS Spam Filtering Using Machine Learning Techniques: A Survey. Mach. Learn. Res. 2016, 1(1), 1-14. doi: 10.11648/j.mlr.20160101.11

    Copy | Download

    AMA Style

    Hedieh Sajedi, Golazin Zarghami Parast, Fatemeh Akbari. SMS Spam Filtering Using Machine Learning Techniques: A Survey. Mach Learn Res. 2016;1(1):1-14. doi: 10.11648/j.mlr.20160101.11

    Copy | Download

  • @article{10.11648/j.mlr.20160101.11,
      author = {Hedieh Sajedi and Golazin Zarghami Parast and Fatemeh Akbari},
      title = {SMS Spam Filtering Using Machine Learning Techniques:  A Survey},
      journal = {Machine Learning Research},
      volume = {1},
      number = {1},
      pages = {1-14},
      doi = {10.11648/j.mlr.20160101.11},
      url = {https://doi.org/10.11648/j.mlr.20160101.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.mlr.20160101.11},
      abstract = {Objective: To report a review of various machine learning and hybrid algorithms for detecting SMS spam messages and comparing them according to accuracy criterion. Data sources: Original articles written in English found in Sciencedirect.com, Google-scholar.com, Search.com, IEEE explorer, and the ACM library. Study selection: Those articles dealing with machine learning and hybrid approaches for SMS spam filtering. Data extraction: Many articles extracted by searching a predefined string and the outcome was reviewed by one author and checked by the second. The primary paper was reviewed and edited by the third author. Results:  A total of 44 articles were selected which were concerned machine learning and hybrid methods for detecting SMS spam messages. 28 methods and algorithms were extracted from these papers and studied and finally 15 algorithms among them have been compared in one table according to their accuracy, strengths, and weaknesses in detecting spam messages of the Tiago dataset of spam message. Actually, among the proposed methods DCA algorithm, the large cellular network method and graph-based KNN are three most accurate in filtering SMS spams of Tiago data set. Moreover, Hybrid methods are discussed in this paper.},
     year = {2016}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - SMS Spam Filtering Using Machine Learning Techniques:  A Survey
    AU  - Hedieh Sajedi
    AU  - Golazin Zarghami Parast
    AU  - Fatemeh Akbari
    Y1  - 2016/12/05
    PY  - 2016
    N1  - https://doi.org/10.11648/j.mlr.20160101.11
    DO  - 10.11648/j.mlr.20160101.11
    T2  - Machine Learning Research
    JF  - Machine Learning Research
    JO  - Machine Learning Research
    SP  - 1
    EP  - 14
    PB  - Science Publishing Group
    SN  - 2637-5680
    UR  - https://doi.org/10.11648/j.mlr.20160101.11
    AB  - Objective: To report a review of various machine learning and hybrid algorithms for detecting SMS spam messages and comparing them according to accuracy criterion. Data sources: Original articles written in English found in Sciencedirect.com, Google-scholar.com, Search.com, IEEE explorer, and the ACM library. Study selection: Those articles dealing with machine learning and hybrid approaches for SMS spam filtering. Data extraction: Many articles extracted by searching a predefined string and the outcome was reviewed by one author and checked by the second. The primary paper was reviewed and edited by the third author. Results:  A total of 44 articles were selected which were concerned machine learning and hybrid methods for detecting SMS spam messages. 28 methods and algorithms were extracted from these papers and studied and finally 15 algorithms among them have been compared in one table according to their accuracy, strengths, and weaknesses in detecting spam messages of the Tiago dataset of spam message. Actually, among the proposed methods DCA algorithm, the large cellular network method and graph-based KNN are three most accurate in filtering SMS spams of Tiago data set. Moreover, Hybrid methods are discussed in this paper.
    VL  - 1
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • Dept. of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran

  • Dept. of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran

  • Dept. of Electrical, Computer and Information Technology, Islamic Azad University, Tehran, Iran

  • Sections