| Peer-Reviewed

Distinguishing True and Fake News by Using Text Mining and Machine Learning Algorithm

Received: 6 July 2020     Accepted: 21 July 2020     Published: 19 September 2020
Views:       Downloads:
Abstract

With recent advancements in social media and technology as a whole, online news sources have increased. Therefore there has been a higher demand of people wanting a convenient way to find recent, relevant and updated online news articles and posts from social media platforms. In the current status quo, many people feel comfortable with their main source of news being social media articles. Unfortunately, receiving news via social media platforms and unverified online sites has aroused many problems, one of which being fake news (news which contain incorrect or biased facts and statements). Many individuals all around the world are vulnerable and subject to fake news and becoming victims of propaganda and/or being misinformed. To solve this world-wide complication, we used word preprocessing skills to digest the content of articles, and used several mathematical vectors to pinpoint the legitimacy of a news article. To establish an accurate system, words used in examples of fake news and real news were collected using Python. Verifying fake and real news is an important process that all news should go through as it can result in immense consequences. Data on real news and fake news were collected from Kaggle. We had the conclusion that the trained machine learning algorithms showed high accuracy of distinguishing which indicates our research was successful.

Published in American Journal of Data Mining and Knowledge Discovery (Volume 5, Issue 2)
DOI 10.11648/j.ajdmkd.20200502.11
Page(s) 20-26
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2020. Published by Science Publishing Group

Keywords

Fake News, Preprocessing Data, Data Analysis, Text Mining, Machine Learning

References
[1] Newspapers Statistics-Worldometer. (n.d.). Retrieved June 20, 2020, from https://www.worldometers.info/newspapers/.
[2] How News Has Changed-News-Macalester College. (n.d.). Retrieved June 20, 2020, from https://www.macalester.edu/news/2017/04/how-news-has-changed/.
[3] What Is Digital Media? All You Need to Know About New Media. (n.d.). Retrieved June 20, 2020, from https://online.maryville.edu/blog/what-is-digital-media/.
[4] Pew Research: People prefer social media over print newspapers for news consumption-TechSpot. (n.d.). Retrieved June 20, 2020, from https://www.techspot.com/news/77816-pew-research-people-prefer-social-media-over-print.html.
[5] Nguyen, A. (2010). Harnessing the potential of online news: Suggestions from a study on the relationship between online news advantages and its post-adoption consequences. Journalism: Theory, Practice & Criticism, 11 (2), 223-241. https://doi.org/10.1177/1464884909355910.
[6] Why do people read online news? (Research summary) | Online Journalism Blog. (n.d.). Retrieved June 20, 2020, from https://onlinejournalismblog.com/2010/04/27/why-do-people-read-online-news-research-summary/.
[7] How Social Media Has Changed How We Consume News. (n.d.). Retrieved June 20, 2020, from https://www.forbes.com/sites/nicolemartin1/2018/11/30/how-social-media-has-changed-how-we-consume-news/#3eea114f3c3c.
[8] Hopkins, J. (n.d.). Research Guides: Fake News: Develop Your Fact-Checking Skills: What Kinds of Fake News Exist?
[9] Types of ’Fake News’ and Why They Matter | Ogilvy. (n.d.). Retrieved June 20, 2020, from https://www.ogilvy.com/ideas/5-types-fake-news-why-they-matter.
[10] Engle, M. (n.d.). LibGuides: Fake News, Propaganda, and Bad information: Learning to Critically Evaluate Media Sources.: What Is Fake News?
[11] Types of online news content accessed worldwide 2016 | Statista. (n.d.). Retrieved June 20, 2020, from https://www.statista.com/statistics/262510/types-of-online-news-content-accessed-worldwide/.
[12] Fake news and critical literacy resources | National Literacy Trust. (n.d.). Retrieved June 20, 2020, from https://literacytrust.org.uk/resources/fake-news-and-critical-literacy-resources/.
[13] Fake News & Social Media EuropCom 2017-Media Literacy Workshop. Retrieved November, 2017, from https://cor.europa.eu/en/events/Documents/Europcom/I.%20Heijnen_Session%2014.pdf.
[14] Logistic Regression-Detailed Overview-Towards Data Science. (n.d.). Retrieved July 4, 2020, from https://towardsdatascience.com/logistic-regression-detailed-overview-46c4da4303bc.
[15] KNN Classification. (n.d.). Retrieved July 4, 2020, from https://www.saedsayad.com/k_nearest_neighbors.htm.
[16] Support Vector Machine-Introduction to Machine Learning Algorithms. (n.d.). Retrieved July 4, 2020, from https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47.
[17] Decision Tree Algorithm-Explained-Towards Data Science. (n.d.). Retrieved July 4, 2020, from https://towardsdatascience.com/decision-tree-algorithm-explained-83beb6e78ef4.
[18] Learn Naive Bayes Algorithm | Naive Bayes Classifier Examples. (n.d.). Retrieved July 4, 2020, from https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/.
Cite This Article
  • APA Style

    Hyunseo Lee, Ian Paik Choe, Jioh In, Han Sol Kim. (2020). Distinguishing True and Fake News by Using Text Mining and Machine Learning Algorithm. American Journal of Data Mining and Knowledge Discovery, 5(2), 20-26. https://doi.org/10.11648/j.ajdmkd.20200502.11

    Copy | Download

    ACS Style

    Hyunseo Lee; Ian Paik Choe; Jioh In; Han Sol Kim. Distinguishing True and Fake News by Using Text Mining and Machine Learning Algorithm. Am. J. Data Min. Knowl. Discov. 2020, 5(2), 20-26. doi: 10.11648/j.ajdmkd.20200502.11

    Copy | Download

    AMA Style

    Hyunseo Lee, Ian Paik Choe, Jioh In, Han Sol Kim. Distinguishing True and Fake News by Using Text Mining and Machine Learning Algorithm. Am J Data Min Knowl Discov. 2020;5(2):20-26. doi: 10.11648/j.ajdmkd.20200502.11

    Copy | Download

  • @article{10.11648/j.ajdmkd.20200502.11,
      author = {Hyunseo Lee and Ian Paik Choe and Jioh In and Han Sol Kim},
      title = {Distinguishing True and Fake News by Using Text Mining and Machine Learning Algorithm},
      journal = {American Journal of Data Mining and Knowledge Discovery},
      volume = {5},
      number = {2},
      pages = {20-26},
      doi = {10.11648/j.ajdmkd.20200502.11},
      url = {https://doi.org/10.11648/j.ajdmkd.20200502.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajdmkd.20200502.11},
      abstract = {With recent advancements in social media and technology as a whole, online news sources have increased. Therefore there has been a higher demand of people wanting a convenient way to find recent, relevant and updated online news articles and posts from social media platforms. In the current status quo, many people feel comfortable with their main source of news being social media articles. Unfortunately, receiving news via social media platforms and unverified online sites has aroused many problems, one of which being fake news (news which contain incorrect or biased facts and statements). Many individuals all around the world are vulnerable and subject to fake news and becoming victims of propaganda and/or being misinformed. To solve this world-wide complication, we used word preprocessing skills to digest the content of articles, and used several mathematical vectors to pinpoint the legitimacy of a news article. To establish an accurate system, words used in examples of fake news and real news were collected using Python. Verifying fake and real news is an important process that all news should go through as it can result in immense consequences. Data on real news and fake news were collected from Kaggle. We had the conclusion that the trained machine learning algorithms showed high accuracy of distinguishing which indicates our research was successful.},
     year = {2020}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Distinguishing True and Fake News by Using Text Mining and Machine Learning Algorithm
    AU  - Hyunseo Lee
    AU  - Ian Paik Choe
    AU  - Jioh In
    AU  - Han Sol Kim
    Y1  - 2020/09/19
    PY  - 2020
    N1  - https://doi.org/10.11648/j.ajdmkd.20200502.11
    DO  - 10.11648/j.ajdmkd.20200502.11
    T2  - American Journal of Data Mining and Knowledge Discovery
    JF  - American Journal of Data Mining and Knowledge Discovery
    JO  - American Journal of Data Mining and Knowledge Discovery
    SP  - 20
    EP  - 26
    PB  - Science Publishing Group
    SN  - 2578-7837
    UR  - https://doi.org/10.11648/j.ajdmkd.20200502.11
    AB  - With recent advancements in social media and technology as a whole, online news sources have increased. Therefore there has been a higher demand of people wanting a convenient way to find recent, relevant and updated online news articles and posts from social media platforms. In the current status quo, many people feel comfortable with their main source of news being social media articles. Unfortunately, receiving news via social media platforms and unverified online sites has aroused many problems, one of which being fake news (news which contain incorrect or biased facts and statements). Many individuals all around the world are vulnerable and subject to fake news and becoming victims of propaganda and/or being misinformed. To solve this world-wide complication, we used word preprocessing skills to digest the content of articles, and used several mathematical vectors to pinpoint the legitimacy of a news article. To establish an accurate system, words used in examples of fake news and real news were collected using Python. Verifying fake and real news is an important process that all news should go through as it can result in immense consequences. Data on real news and fake news were collected from Kaggle. We had the conclusion that the trained machine learning algorithms showed high accuracy of distinguishing which indicates our research was successful.
    VL  - 5
    IS  - 2
    ER  - 

    Copy | Download

Author Information
  • Seoul International School, Gyeonggi-do, South Korea

  • Seoul Foreign School, Seoul, South Korea

  • Princeton International School of Mathematics and Science, New Jersey, the United States

  • Fayston Preparatory School, Gyeonggi-do, South Korea

  • Sections