Review Article | | Peer-Reviewed

Machine Learning for Text Classification on Twitter: A Literature Review

Received: 17 September 2023     Accepted: 16 October 2023     Published: 9 November 2023
Views:       Downloads:
Abstract

This literature review examines the application of machine learning (ML) techniques for text classification on Twitter. With the immense volume of data generated on social media platforms like Twitter, there is a need for automated methods to extract valuable information. ML, known for its ability to learn patterns and relationships in large datasets, has gained significant attention in this context. The purpose of this review is to explore the background and aim of ML for text classification on Twitter, the methods employed, the results obtained, and the conclusions drawn. The review begins by discussing the background and aim, emphasizing the vast amount of data available on Twitter and the need for automated techniques to extract useful information from this data. It highlights the significance of ML in addressing this challenge, particularly in tasks such as sentiment analysis, topic modeling, and spam detection, which play a crucial role in social media analysis. Next, the review provides an overview of the methods used in various studies on text classification using Twitter data. It explores the latest approaches and techniques employed in ML, including feature extraction methods like bag-of-words, n-grams, and word embeddings. It also discusses the preprocessing steps involved in preparing Twitter data for classification tasks. subsequently, the review presents the results obtained from different studies in the field. It discusses the performance metrics used to evaluate the effectiveness of ML models, highlighting measures such as accuracy, precision, recall, and F1-score. The review also discusses variations in performance across different classification tasks, providing insights into the strengths and limitations of the approaches used.

Published in American Journal of Data Mining and Knowledge Discovery (Volume 8, Issue 1)
DOI 10.11648/j.ajdmkd.20230801.12
Page(s) 11-17
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2023. Published by Science Publishing Group

Keywords

Machine Learning, Text Classification, Twitter Data, NLP

References
[1] Statista. (2021). Number of monthly active Twitter users worldwide from 1st quarter 2010 to 2nd quarter 2021 (in millions). Retrieved from https://www.statista.com/statistics/282087/number-of-monthly-active
[2] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521 (7553), 436-444.
[3] Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34 (1), 1-47.
[4] Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2 (1-2), 1-135.
[5] Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3 (Jan), 993-1022.
[6] Cormack, G., & Lynam, T. (2007). Spam Filtering: A review. Foundations and Trends in Information Retrieval, 1 (4), 267-349.
[7] M. Young, The Technical Writer’s Handbook. Mill Valley, CA: University Science, 1989.
[8] Hasan, M. R., Maliha, M., & Arifuzzaman, M. (2019, July). Sentiment analysis with NLP on Twitter data. In 2019 international conference on computer, communication, chemical, materials and electronic engineering (IC4ME2) (pp. 1-4). IEEE.
[9] Basarslan, M. S., & Kayaalp, F. (2020). Sentiment analysis with machine learning methods on social media.‏
[10] Muneer, A., & Fati, S. M. (2020). A comparative analysis of machine learning techniques for cyberbullying detection on twitter. Future Internet, 12 (11), 187.
[11] Harjule, P., Gurjar, A., Seth, H., & Thakur, P. (2020, February). Text classification on Twitter data. In 2020 3rd International Conference on Emerging Technologies in Computer Engineering: Machine Learning and Internet of Things (ICETCE) (pp. 160-164). IEEE.
[12] Wadhwa, S., & Babber, K. (2021). Performance comparison of classifiers on twitter sentimental analysis. European Journal of Engineering Science and Technology, 4 (3), 15-24.
[13] Shamrat, F. M. J. M., Chakraborty, S., Imran, M. M., Muna, J. N., Billah, M. M., Das, P., & Rahman, O. M. (2021). Sentiment analysis on twitter tweets about COVID-19 vaccines using NLP and supervised KNN classification algorithm. Indonesian Journal of Electrical Engineering and Computer Science, 23 (1), 463-470.‏
[14] Hidayat, T. H. J., Ruldeviyani, Y., Aditama, A. R., Madya, G. R., Nugraha, A. W., & Adisaputra, M. W. (2022). Sentiment analysis of twitter data related to Rinca Island development using Doc2Vec and SVM and logistic regression as classifier. Procedia Computer Science, 197, 660-667.‏
[15] AlBadani, B., Shi, R., & Dong, J. (2022). A novel machine learning approach for sentiment analysis on twitter incorporating the universal language model fine-tuning and SVM. Applied System Innovation, 5 (1).
[16] Khanday, A. M. U. D., Rabani, S. T., Khan, Q. R., & Malik, S. H. (2022). Detecting twitter hate speech in COVID-19 era using machine learning and ensemble learning techniques. International Journal of Information Management Data Insights, 2 (2), 100120.‏
[17] Rahman, S., Jahan, N., Sadia, F., & Mahmud, I. (2023). Social crisis detection using Twitter based text mining-a machine learning approach. Bulletin of Electrical Engineering and Informatics, 12 (2), 1069-1077.‏
[18] Ellyanti, L., Ruldeviyani, Y., Pradana, L. E., & Harjanto, A. (2023). Sentiment Analysis of Twitter Users to the PeduliLindungi Using Naïve Bayes Algorithm. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 7 (2), 414-421.
[19] Wadhwani, G. K., Varshney, P. K., Gupta, A., & Kumar, S. (2023). Sentiment Analysis and Comprehensive Evaluation of Supervised Machine Learning Models Using Twitter Data on Russia–Ukraine War. SN Computer Science, 4 (4), 346.
Cite This Article
  • APA Style

    Alsurori, M., Enan, A., Alwan, R., Algumaei, W., Alturki, S., et al. (2023). Machine Learning for Text Classification on Twitter: A Literature Review. American Journal of Data Mining and Knowledge Discovery, 8(1), 11-17. https://doi.org/10.11648/j.ajdmkd.20230801.12

    Copy | Download

    ACS Style

    Alsurori, M.; Enan, A.; Alwan, R.; Algumaei, W.; Alturki, S., et al. Machine Learning for Text Classification on Twitter: A Literature Review. Am. J. Data Min. Knowl. Discov. 2023, 8(1), 11-17. doi: 10.11648/j.ajdmkd.20230801.12

    Copy | Download

    AMA Style

    Alsurori M, Enan A, Alwan R, Algumaei W, Alturki S, et al. Machine Learning for Text Classification on Twitter: A Literature Review. Am J Data Min Knowl Discov. 2023;8(1):11-17. doi: 10.11648/j.ajdmkd.20230801.12

    Copy | Download

  • @article{10.11648/j.ajdmkd.20230801.12,
      author = {Muneer Alsurori and Ahlam Enan and Rahuf Alwan and Wafa Algumaei and Somia Alturki and Entsar Alkahtany},
      title = {Machine Learning for Text Classification on Twitter: A Literature Review},
      journal = {American Journal of Data Mining and Knowledge Discovery},
      volume = {8},
      number = {1},
      pages = {11-17},
      doi = {10.11648/j.ajdmkd.20230801.12},
      url = {https://doi.org/10.11648/j.ajdmkd.20230801.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajdmkd.20230801.12},
      abstract = {This literature review examines the application of machine learning (ML) techniques for text classification on Twitter. With the immense volume of data generated on social media platforms like Twitter, there is a need for automated methods to extract valuable information. ML, known for its ability to learn patterns and relationships in large datasets, has gained significant attention in this context. The purpose of this review is to explore the background and aim of ML for text classification on Twitter, the methods employed, the results obtained, and the conclusions drawn. The review begins by discussing the background and aim, emphasizing the vast amount of data available on Twitter and the need for automated techniques to extract useful information from this data. It highlights the significance of ML in addressing this challenge, particularly in tasks such as sentiment analysis, topic modeling, and spam detection, which play a crucial role in social media analysis. Next, the review provides an overview of the methods used in various studies on text classification using Twitter data. It explores the latest approaches and techniques employed in ML, including feature extraction methods like bag-of-words, n-grams, and word embeddings. It also discusses the preprocessing steps involved in preparing Twitter data for classification tasks. subsequently, the review presents the results obtained from different studies in the field. It discusses the performance metrics used to evaluate the effectiveness of ML models, highlighting measures such as accuracy, precision, recall, and F1-score. The review also discusses variations in performance across different classification tasks, providing insights into the strengths and limitations of the approaches used.
    },
     year = {2023}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Machine Learning for Text Classification on Twitter: A Literature Review
    AU  - Muneer Alsurori
    AU  - Ahlam Enan
    AU  - Rahuf Alwan
    AU  - Wafa Algumaei
    AU  - Somia Alturki
    AU  - Entsar Alkahtany
    Y1  - 2023/11/09
    PY  - 2023
    N1  - https://doi.org/10.11648/j.ajdmkd.20230801.12
    DO  - 10.11648/j.ajdmkd.20230801.12
    T2  - American Journal of Data Mining and Knowledge Discovery
    JF  - American Journal of Data Mining and Knowledge Discovery
    JO  - American Journal of Data Mining and Knowledge Discovery
    SP  - 11
    EP  - 17
    PB  - Science Publishing Group
    SN  - 2578-7837
    UR  - https://doi.org/10.11648/j.ajdmkd.20230801.12
    AB  - This literature review examines the application of machine learning (ML) techniques for text classification on Twitter. With the immense volume of data generated on social media platforms like Twitter, there is a need for automated methods to extract valuable information. ML, known for its ability to learn patterns and relationships in large datasets, has gained significant attention in this context. The purpose of this review is to explore the background and aim of ML for text classification on Twitter, the methods employed, the results obtained, and the conclusions drawn. The review begins by discussing the background and aim, emphasizing the vast amount of data available on Twitter and the need for automated techniques to extract useful information from this data. It highlights the significance of ML in addressing this challenge, particularly in tasks such as sentiment analysis, topic modeling, and spam detection, which play a crucial role in social media analysis. Next, the review provides an overview of the methods used in various studies on text classification using Twitter data. It explores the latest approaches and techniques employed in ML, including feature extraction methods like bag-of-words, n-grams, and word embeddings. It also discusses the preprocessing steps involved in preparing Twitter data for classification tasks. subsequently, the review presents the results obtained from different studies in the field. It discusses the performance metrics used to evaluate the effectiveness of ML models, highlighting measures such as accuracy, precision, recall, and F1-score. The review also discusses variations in performance across different classification tasks, providing insights into the strengths and limitations of the approaches used.
    
    VL  - 8
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • Information Technology Science, Ibb University, Ibb, Yemen

  • Information Technology Science, Ibb University, Ibb, Yemen

  • Information Technology Science, Ibb University, Ibb, Yemen

  • Information Technology Science, Ibb University, Ibb, Yemen

  • Information Technology Science, Ibb University, Ibb, Yemen

  • Information Technology Science, Ibb University, Ibb, Yemen

  • Sections