Automatic literature classification via machine learning has witnessed increasing attention in various research circles, especially computing community because of the availability of large body of research articles in diverse fields. Existing works have largely drawn features from segments of articles such as abstracts, contents and their metadata with little or no attention for references. This paper posited that correlating article and reference features would enhance the performance of machine learning algorithms. Therefore, we exploited the correlation of TFIDF of articles and references using association rule and cosine similarity-based correlation methods for classification of computing literatures. We focused on Adekunle Ajasin University Research Repository. Based on the ACM’s and Denning’s taxonomies, the research articles in the database were labelled by experienced computing professionals. Logistic Regression, Support Vector Machine and Multilayer Perceptron Neural Network with N-Gram features were explored as classifiers. For ACM’s taxonomy, the highest accuracy and F1-score of 0.56 and 0.41, respectively were obtained for association rule-based correlation; 0.62 and 0.51, respectively for similarity-based correlation; and 0.59 and 0.46, respectively for the existing article-based classification. For Denning’s taxonomy, the highest accuracy and F1-score of 0.41 and 0.40, respectively were obtained for association rule-based correlation; 0.41 and 0.36, respectively for similarity-based correlation; and 0.38 and 0.37, respectively for the existing article-based classification. These results show that both methods of correlation have better prospect than the popular abstract-based classification method in automatic classification of computing literatures.
Published in | American Journal of Computer Science and Technology (Volume 5, Issue 4) |
DOI | 10.11648/j.ajcst.20220504.12 |
Page(s) | 204-209 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2022. Published by Science Publishing Group |
Computing, Research Articles, Machine Learning, Classification, Reference Features
[1] | Akritidis, L., and Panayiotis, B. (2013). A Supervised Machine Learning Classification Algorithm for Research Articles. In SAC’13. Coimbra: ACM. |
[2] | Rivest, M., Etienne, V., and E´ric, A. (2021). Article-Level Classification of Scientific Publications : A Comparison of Deep Learning, Direct Citation and Bibliographic Coupling. PLoS ONE, 16 (5): 1–18. https://doi.org/10.1371/journal.pone.0251493. |
[3] | Archambault, E., Beauchesne, O. H., and Caruso, J. (2011). Towards a multilingual, comprehensive and open scientific journal ontology. In: Noyons, B., Ngulube, P., and Leta, J., editors. Proceedings of the 13th International Conference of the International Society for Scientometrics and Informetrics, 13: 66–77. http://science-metrix.com/?q=en/publications/conference-presentations/towards-a-multilingualcomprehensive-and-open-scientific. |
[4] | Shu, F., Julien, C. A., Zhang, L., Qiu, J., Zhang, J., and Larivière, V. (2019). Comparing journal and paper level classifications of science. Journal of Informetrics, 13 (1): 202–25. https://www.sciencedirect.com/science/article/pii/S1751157718303298. |
[5] | Sjogårde, P., and Ahlgren, P. (2020). Granularity of algorithmically constructed publication-level classifications of research publications: Identification of specialties. Quant. Sci. Stud. 1 (1): 207–38. https://www.mitpressjournals.org/doi/abs/10.1162/qss_a_00004. |
[6] | Waltman, L., and van Eck, N. J. (2012). A new methodology for constructing a publication-level classification system of science. Journal of American Social Information Science and Technology, 63 (12): 2378–92. https://arxiv.org/abs/1203.0532. |
[7] | Adele, P., and Alden, D. (2017). Classification of Journal Articles in a Search for New Experimental Thermophysical Property Data: A Case Study, Integrated Material and Manufacturing Innovations (2017) 6: 187–196. https://www.doi.org/10.1007/s40192-017-0096-1 |
[8] | Chen, D., Hans-michael, M., and Paul, W. S. (2006). Automatic Document Classification of Biological Literature, 11: 1–11. https://doi.org/10.1186/1471-2105-7-370. |
[9] | Caragea, C., Adrian, S., Saurabh, K., Doina, C., and Prasenjit, M. (2011). Classifying Scientific Publications Using Abstract Features. Association for the Advancement of Artificial Intelligence. https://www.aaai.org/. |
[10] | Roul, R. K., and Jajati K. S. (2017). A New Technique Classification of Research Articles Hierarchically : A New Technique. In H.S. Behera and D.P. Mohapatra (Eds.), Computational Intelligence in Data Mining, Advances in Intelligent Systems and Computing 556. https://doi.org/10.1007/978-981-10-3874-7. |
[11] | Kandimalla, B., Shaurya, R., Jian, W., and Giles, C. L. (2021). Large Scale Subject Category Classi Fi Cation of Scholarly Papers With Deep Attentive Neural Networks. Frontiers in Research Metrics and Analytics 5 (2): 1–12. https://doi.org/10.3389/frma.2020.600382. |
[12] | Pan, Z., Patrick, S., Setareh, R., Zhengtong, P., and Setareh R.. 2022. Ontology-Driven Scientific Literature Classification Using Clustering and Self-Supervised Learning. In Easychair Preprint. |
[13] | Chowdhury Shovan and Schoen Marco P. (2020) Research Paper Classification using Supervised Machine Learning Techniques. (2020). Intermountain Engineering, Technology and Computing (IETC), https://doi.org/10.1109/IETC47856.2020.9249211 |
[14] | Denning, P. J. (1997). Computer Science: The Discipline, In A. Ralston and D. Hemmendinger (Eds.), 2000 Edition of Encyclopedia of Computer Science. |
[15] | Bird, S., Klein, E. and Loper, E. (2009). Natural language processing with Python: Analyzing text with the natural language toolkit. O’Reilly Media, Inc. |
[16] | Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., and R. Weiss. (2011). Scikit-Learn: Machine Learning in Python. Journal of Machine Learning Research, 12: 2825–2830. |
APA Style
Oluwafemi Oriola, Lawrence Ojo, Ojonoka Atawodi. (2022). Automatic Classification of Computing Literatures via Article and Reference Correlation. American Journal of Computer Science and Technology, 5(4), 204-209. https://doi.org/10.11648/j.ajcst.20220504.12
ACS Style
Oluwafemi Oriola; Lawrence Ojo; Ojonoka Atawodi. Automatic Classification of Computing Literatures via Article and Reference Correlation. Am. J. Comput. Sci. Technol. 2022, 5(4), 204-209. doi: 10.11648/j.ajcst.20220504.12
@article{10.11648/j.ajcst.20220504.12, author = {Oluwafemi Oriola and Lawrence Ojo and Ojonoka Atawodi}, title = {Automatic Classification of Computing Literatures via Article and Reference Correlation}, journal = {American Journal of Computer Science and Technology}, volume = {5}, number = {4}, pages = {204-209}, doi = {10.11648/j.ajcst.20220504.12}, url = {https://doi.org/10.11648/j.ajcst.20220504.12}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajcst.20220504.12}, abstract = {Automatic literature classification via machine learning has witnessed increasing attention in various research circles, especially computing community because of the availability of large body of research articles in diverse fields. Existing works have largely drawn features from segments of articles such as abstracts, contents and their metadata with little or no attention for references. This paper posited that correlating article and reference features would enhance the performance of machine learning algorithms. Therefore, we exploited the correlation of TFIDF of articles and references using association rule and cosine similarity-based correlation methods for classification of computing literatures. We focused on Adekunle Ajasin University Research Repository. Based on the ACM’s and Denning’s taxonomies, the research articles in the database were labelled by experienced computing professionals. Logistic Regression, Support Vector Machine and Multilayer Perceptron Neural Network with N-Gram features were explored as classifiers. For ACM’s taxonomy, the highest accuracy and F1-score of 0.56 and 0.41, respectively were obtained for association rule-based correlation; 0.62 and 0.51, respectively for similarity-based correlation; and 0.59 and 0.46, respectively for the existing article-based classification. For Denning’s taxonomy, the highest accuracy and F1-score of 0.41 and 0.40, respectively were obtained for association rule-based correlation; 0.41 and 0.36, respectively for similarity-based correlation; and 0.38 and 0.37, respectively for the existing article-based classification. These results show that both methods of correlation have better prospect than the popular abstract-based classification method in automatic classification of computing literatures.}, year = {2022} }
TY - JOUR T1 - Automatic Classification of Computing Literatures via Article and Reference Correlation AU - Oluwafemi Oriola AU - Lawrence Ojo AU - Ojonoka Atawodi Y1 - 2022/10/21 PY - 2022 N1 - https://doi.org/10.11648/j.ajcst.20220504.12 DO - 10.11648/j.ajcst.20220504.12 T2 - American Journal of Computer Science and Technology JF - American Journal of Computer Science and Technology JO - American Journal of Computer Science and Technology SP - 204 EP - 209 PB - Science Publishing Group SN - 2640-012X UR - https://doi.org/10.11648/j.ajcst.20220504.12 AB - Automatic literature classification via machine learning has witnessed increasing attention in various research circles, especially computing community because of the availability of large body of research articles in diverse fields. Existing works have largely drawn features from segments of articles such as abstracts, contents and their metadata with little or no attention for references. This paper posited that correlating article and reference features would enhance the performance of machine learning algorithms. Therefore, we exploited the correlation of TFIDF of articles and references using association rule and cosine similarity-based correlation methods for classification of computing literatures. We focused on Adekunle Ajasin University Research Repository. Based on the ACM’s and Denning’s taxonomies, the research articles in the database were labelled by experienced computing professionals. Logistic Regression, Support Vector Machine and Multilayer Perceptron Neural Network with N-Gram features were explored as classifiers. For ACM’s taxonomy, the highest accuracy and F1-score of 0.56 and 0.41, respectively were obtained for association rule-based correlation; 0.62 and 0.51, respectively for similarity-based correlation; and 0.59 and 0.46, respectively for the existing article-based classification. For Denning’s taxonomy, the highest accuracy and F1-score of 0.41 and 0.40, respectively were obtained for association rule-based correlation; 0.41 and 0.36, respectively for similarity-based correlation; and 0.38 and 0.37, respectively for the existing article-based classification. These results show that both methods of correlation have better prospect than the popular abstract-based classification method in automatic classification of computing literatures. VL - 5 IS - 4 ER -