Due to the recent COVID-19 crisis, there is an increasing need for effective communication and sharing of information internationally in various fields. One of the obstacles that these needs face are language: In texts such as COVID-19 related research, currently existing machine translations which are effective in normal texts because they are trained with normal-context data are often inaccurate, and manual translation is slow and laboursome. So, the exchange of information is being delayed. To overcome this language barrier, this project aimed to create a model that is effective for translating COVID-19 crisis related data specifically. In the research, there are two models created: one is trained with TAUS English-French Corona Crisis Corpus, and another used transfer learning by Kaggle English-French corpus and then trained with TAUS corpus. The model consisted of four bidirectional GRU layers, and used rmsprop as optimizer. The project evaluated the model using the BLEU score. The first model had a higher BLEU score than the second model, supporting the hypothesis that loosely related datasets decrease the quality of translation. In further research, evaluation on this model on different language pairs and use datasets in other specific fields will be conducted.
Published in | American Journal of Data Mining and Knowledge Discovery (Volume 6, Issue 1) |
DOI | 10.11648/j.ajdmkd.20210601.12 |
Page(s) | 9-15 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2021. Published by Science Publishing Group |
Seq2seq, COVID-19, Bi-GRU, Machine Translation, NLP
[1] | Coronavirus (COVID-19). (2021, September 2). Google News. https://news.google.com/COVID19/map?hl=en-US&mid=%2Fm%2F06qd3&gl=US&ceid=US%3Aen. |
[2] | Coronavirus disease (COVID-19). (2020, October 12). World Health Organization. https://www.who.int/emergencies/diseases/novel-coronavirus-2019/question-and-answers-hub/q-a-detail/coronavirus-disease-COVID-19. |
[3] | Revenue from nlp market worldwide - Google zoeken. (2020, June 8). Statistica. https://www.google.com/search?q=revenue+from+nlp+market+worldwide&oq=revenue+from+nlp+market+worldwide&aqs=chrome.69i57j33i160.4547j1j4&sourceid=chrome&ie=UTF-8. |
[4] | Park, C. J., Kim, K. H., Park, K. N., & Lim, H. S. (2020). Neural Machine translation specialized for Coronavirus Disease-19 (COVID-19). Journal of the Korea Convergence Society, 11 (9), 7-13. https://doi.org/10.15207/JKCS.2020.11.9.007. |
[5] | Mahata, S. K., Das, D., & Bandyopadhyay, S. (2019). MTIL2017: Machine Translation Using Recurrent Neural Network on Statistical Machine Translation. Journal of Intelligent Systems, 28 (3), 447–453. https://doi.org/10.1515/jisys-2018-0016. |
[6] | Way, A., Haque, R., Xie, G., Gaspari, F., Popović, M., & Poncelas, A. (2020). Rapid Development of Competitive Translation Engines for Access to Multilingual COVID-19 Information. Informatics, 7 (2), 19. https://doi.org/10.3390/informatics7020019. |
[7] | Komal, K., & Sharma, A. (2020). NATURAL LANGUAGE PROCESSING: AN APPROACH TO AID EMERGENCY SERVICES IN COVID-19 PANDEMIC. International Journal of Innovative Research in Computer Science & Technology, 8 (3). https://doi.org/10.21276/ijircst.2020.8.3.32. |
[8] | Kvapilikova, I., & Bojar, O. (2020). CUNI Machine Translation Systems for the COVID-19 MLIA Initiative. |
[9] | Corona Corpus - TAUS Matching Data. (n.d.). TAUS. Retrieved September 3, 2021, from https://md.taus.net/corona. |
[10] | Language Translation (English-French). (2020, April 8). Kaggle. https://www.kaggle.com/devicharith/language-translation-englishfrench. |
[11] | Cheon, M. J., Lee, D. H., Joo, H. S., & Lee, O. (2021). Deep learning based hybrid approach of detecting fraudulent transactions. Journal of Theoretical and Applied Information Technology, 99 (16), 4044-4054. |
[12] | Shewalkar, A., Nyavanandi, D., & Ludwig, S. A. (2019). Performance Evaluation of Deep Neural Networks Applied to Speech Recognition: RNN, LSTM and GRU. Journal of Artificial Intelligence and Soft Computing Research, 9 (4), 235–245. https://doi.org/10.2478/jaiscr-2019-0006. |
[13] | Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv: 1406.1078. |
[14] | Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems (pp. 3104-3112). |
[15] | Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv: 1409.0473. |
APA Style
Daniel Chang. (2021). Assisting Access to COVID-19 Information Through Deep Learning Based Machine Translation: Attention Mechanism Via Bidirectional GRU. American Journal of Data Mining and Knowledge Discovery, 6(1), 9-15. https://doi.org/10.11648/j.ajdmkd.20210601.12
ACS Style
Daniel Chang. Assisting Access to COVID-19 Information Through Deep Learning Based Machine Translation: Attention Mechanism Via Bidirectional GRU. Am. J. Data Min. Knowl. Discov. 2021, 6(1), 9-15. doi: 10.11648/j.ajdmkd.20210601.12
AMA Style
Daniel Chang. Assisting Access to COVID-19 Information Through Deep Learning Based Machine Translation: Attention Mechanism Via Bidirectional GRU. Am J Data Min Knowl Discov. 2021;6(1):9-15. doi: 10.11648/j.ajdmkd.20210601.12
@article{10.11648/j.ajdmkd.20210601.12, author = {Daniel Chang}, title = {Assisting Access to COVID-19 Information Through Deep Learning Based Machine Translation: Attention Mechanism Via Bidirectional GRU}, journal = {American Journal of Data Mining and Knowledge Discovery}, volume = {6}, number = {1}, pages = {9-15}, doi = {10.11648/j.ajdmkd.20210601.12}, url = {https://doi.org/10.11648/j.ajdmkd.20210601.12}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajdmkd.20210601.12}, abstract = {Due to the recent COVID-19 crisis, there is an increasing need for effective communication and sharing of information internationally in various fields. One of the obstacles that these needs face are language: In texts such as COVID-19 related research, currently existing machine translations which are effective in normal texts because they are trained with normal-context data are often inaccurate, and manual translation is slow and laboursome. So, the exchange of information is being delayed. To overcome this language barrier, this project aimed to create a model that is effective for translating COVID-19 crisis related data specifically. In the research, there are two models created: one is trained with TAUS English-French Corona Crisis Corpus, and another used transfer learning by Kaggle English-French corpus and then trained with TAUS corpus. The model consisted of four bidirectional GRU layers, and used rmsprop as optimizer. The project evaluated the model using the BLEU score. The first model had a higher BLEU score than the second model, supporting the hypothesis that loosely related datasets decrease the quality of translation. In further research, evaluation on this model on different language pairs and use datasets in other specific fields will be conducted.}, year = {2021} }
TY - JOUR T1 - Assisting Access to COVID-19 Information Through Deep Learning Based Machine Translation: Attention Mechanism Via Bidirectional GRU AU - Daniel Chang Y1 - 2021/10/12 PY - 2021 N1 - https://doi.org/10.11648/j.ajdmkd.20210601.12 DO - 10.11648/j.ajdmkd.20210601.12 T2 - American Journal of Data Mining and Knowledge Discovery JF - American Journal of Data Mining and Knowledge Discovery JO - American Journal of Data Mining and Knowledge Discovery SP - 9 EP - 15 PB - Science Publishing Group SN - 2578-7837 UR - https://doi.org/10.11648/j.ajdmkd.20210601.12 AB - Due to the recent COVID-19 crisis, there is an increasing need for effective communication and sharing of information internationally in various fields. One of the obstacles that these needs face are language: In texts such as COVID-19 related research, currently existing machine translations which are effective in normal texts because they are trained with normal-context data are often inaccurate, and manual translation is slow and laboursome. So, the exchange of information is being delayed. To overcome this language barrier, this project aimed to create a model that is effective for translating COVID-19 crisis related data specifically. In the research, there are two models created: one is trained with TAUS English-French Corona Crisis Corpus, and another used transfer learning by Kaggle English-French corpus and then trained with TAUS corpus. The model consisted of four bidirectional GRU layers, and used rmsprop as optimizer. The project evaluated the model using the BLEU score. The first model had a higher BLEU score than the second model, supporting the hypothesis that loosely related datasets decrease the quality of translation. In further research, evaluation on this model on different language pairs and use datasets in other specific fields will be conducted. VL - 6 IS - 1 ER -