| Peer-Reviewed

Designing a Voice-controlled Wheelchair for Persian-speaking Users Using Deep Learning Networks with a Small Dataset

Received: 31 July 2021    Accepted: 12 August 2021    Published: 5 November 2021
Views:       Downloads:
Abstract

With the advancement of technology, the demand for improving the quality of life of the elderly and disabled has increased and their hope to overcome their problem is realized by using advanced technologies in the field of rehabilitation. Many existing electrical and electronic devices can be turned into more controllable and more functional devices using artificial intelligence. In every society, some spinal disabled people lack physical and motor abilities such as moving their limbs and they cannot use the normal wheelchair and need a wheelchair with voice control. The main challenge of this project is to identify the voice patterns of disabled people. Audio classification is one of the challenges in the field of pattern recognition. In this paper, a method of classifying ambient sounds based on the sound spectrogram, using deep neural networks is presented to classify Persian speakers sound for building a voice-controlled intelligent wheelchair. To do this, we used Inception-V3 as a convolutional neural network which is pretrained by the ImageNet dataset. In the next step, we trained the network with images that are generated using spectrogram images of the ambient sound of about 50 Persian speakers. The experimental results achieved a mean accuracy of 83.33%. In this plan, there is the ability to control the wheelchair by a third party (such as spouse, children or parents) by installing an application on their mobile phones. This wheelchair will be able to execute five commands such as stop, left, right, front and back.

Published in Machine Learning Research (Volume 6, Issue 1)
DOI 10.11648/j.mlr.20210601.11
Page(s) 1-7
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Voice Recognition, Deep Learning, Convolutional Neural Network, Spectrogram, Inception-V3

References
[1] Ghorbel, A., N. B. Amor, and M. Jallouli, A survey on different human-machine interactions used for controlling an electric wheelchair. Procedia Computer Science, 2019. 159: p. 398-407.
[2] Mazo, M., et al., Electronic control of a wheelchair guided by voice commands. Control Engineering Practice, 1995. 3 (5): p. 665-674.
[3] Tomari, M. R. M., Y. Kobayashi, and Y. Kuno, Development of Smart Wheelchair System for a User with Severe Motor Impairment. Procedia Engineering, 2012. 41: p. 538-546.
[4] Kumar, D., R. Malhotra, and S. R. Sharma, Design and Construction of a Smart Wheelchair. Procedia Computer Science, 2020. 172: p. 302-307.
[5] Ruíz-Serrano, A., et al., Development of a Dual Control System Applied to a Smart Wheelchair, using Magnetic and Speech Control. Procedia Technology, 2013. 7: p. 158-165.
[6] Scardapane, S., et al., Microphone array based classification for security monitoring in unstructured environments. AEU - International Journal of Electronics and Communications, 2015. 69 (11): p. 1715-1723.
[7] Maccagno, A., et al., A CNN Approach for Audio Classification in Construction Sites, in Progresses in Artificial Intelligence and Neural Systems, A. Esposito, et al., Editors. 2021, Springer Singapore: Singapore. p. 371-381.
[8] Wold, E., et al., Content-based classification, search, and retrieval of audio. IEEE MultiMedia, 1996. 3 (3): p. 27-36.
[9] Weninger, F. and B. Schuller. Audio recognition in the wild: Static and dynamic classification on a real-world database of animal vocalizations. in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2011.
[10] Ghiurcau, M. V., et al., Audio based solutions for detecting intruders in wild areas. Signal Process., 2012. 92 (3): p. 829–840.
[11] Rabaoui, A., et al., Using One-Class SVMs and Wavelets for Audio Surveillance. Trans. Info. For. Sec., 2008. 3 (4): p. 763–775.
[12] Xu, W., et al., A multi-view CNN-based acoustic classification system for automatic animal species identification. Ad Hoc Networks, 2020. 102: p. 102115.
[13] Deperlioglu, O., Heart sound classification with signal instant energy and stacked autoencoder network. Biomedical Signal Processing and Control, 2021. 64: p. 102211.
[14] Mahmoudian, S., et al., Acoustic Analysis of Crying Signal in Infants with Disabling Hearing Impairment. Journal of Voice, 2019. 33 (6): p. 946. e7-946. e13.
[15] Messner, E., et al., Multi-channel lung sound classification with convolutional recurrent neural networks. Computers in Biology and Medicine, 2020. 122: p. 103831.
[16] Alías, F., J. C. Socoró, and X. Sevillano, A review of physical and perceptual feature extraction techniques for speech, music and environmental sounds. Applied Sciences, 2016. 6 (5): p. 143.
[17] Lyon, R. F., et al., Sound retrieval and ranking using sparse auditory representations. Neural computation, 2010. 22 (9): p. 2390-2416.
[18] Thiruvengatanadhan, R., Speech Recognition using SVM. International Research Journal of Engineering and Technology (IRJET), 2018. 5 (9): p. 918-921.
[19] Alifani, F., T. W. Purboyo, and C. Setianingsih. Implementation of Voice Recognition in Disaster Victim Detection Using Hidden Markov Model (HMM) Method. in 2019 International Seminar on Intelligent Technology and Its Applications (ISITIA). 2019. IEEE.
[20] Totakura, V., B. R. Vuribindi, and E. M. Reddy. Improved Safety of Self-Driving Car using Voice Recognition through CNN. In IOP Conference Series: Materials Science and Engineering. 2021. IOP Publishing.
[21] Chandankhede, P. H., A. S. Titarmare, and S. Chauhvan. Voice Recognition Based Security System Using Convolutional Neural Network. In 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS). 2021. IEEE.
[22] Sharan, R. V., S. Berkovsky, and S. Liu. Voice command recognition using biologically inspired time-frequency representation and convolutional neural networks. in 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). 2020. IEEE.
[23] Szegedy, C., et al., Rethinking the Inception Architecture for Computer Vision. 2015.
[24] Dong, N., et al., Inception v3 based cervical cell classification combined with artificially extracted features. Applied Soft Computing, 2020. 93: p. 106311.
[25] Szegedy, C., et al. Going deeper with convolutions. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
[26] Ding, Y., et al., A deep learning model to predict a diagnosis of Alzheimer disease by using 18F-FDG PET of the brain. Radiology, 2019. 290 (2): p. 456-464.
[27] Khamparia, A., et al., Sound Classification Using Convolutional Neural Network and Tensor Deep Stacking Network. IEEE Access, 2019. 7: p. 7717-7727.
[28] Altes, R., Detection, estimation, and classification with spectrograms. Journal of the Acoustical Society of America, 1980. 67: p. 1232-1246.
[29] Hussein, W., M. Hussein, and T. Becker, Spectrogram Enhancement by Edge Detection Approach Applied To Bioacoustics Calls Classification. International Journal of signal and image processing, 2012. 3: p. 1-20.
Cite This Article
  • APA Style

    Mohammad Amiri, Manizheh Ranjbar, Mostafa Azami Gharetappeh. (2021). Designing a Voice-controlled Wheelchair for Persian-speaking Users Using Deep Learning Networks with a Small Dataset. Machine Learning Research, 6(1), 1-7. https://doi.org/10.11648/j.mlr.20210601.11

    Copy | Download

    ACS Style

    Mohammad Amiri; Manizheh Ranjbar; Mostafa Azami Gharetappeh. Designing a Voice-controlled Wheelchair for Persian-speaking Users Using Deep Learning Networks with a Small Dataset. Mach. Learn. Res. 2021, 6(1), 1-7. doi: 10.11648/j.mlr.20210601.11

    Copy | Download

    AMA Style

    Mohammad Amiri, Manizheh Ranjbar, Mostafa Azami Gharetappeh. Designing a Voice-controlled Wheelchair for Persian-speaking Users Using Deep Learning Networks with a Small Dataset. Mach Learn Res. 2021;6(1):1-7. doi: 10.11648/j.mlr.20210601.11

    Copy | Download

  • @article{10.11648/j.mlr.20210601.11,
      author = {Mohammad Amiri and Manizheh Ranjbar and Mostafa Azami Gharetappeh},
      title = {Designing a Voice-controlled Wheelchair for Persian-speaking Users Using Deep Learning Networks with a Small Dataset},
      journal = {Machine Learning Research},
      volume = {6},
      number = {1},
      pages = {1-7},
      doi = {10.11648/j.mlr.20210601.11},
      url = {https://doi.org/10.11648/j.mlr.20210601.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.mlr.20210601.11},
      abstract = {With the advancement of technology, the demand for improving the quality of life of the elderly and disabled has increased and their hope to overcome their problem is realized by using advanced technologies in the field of rehabilitation. Many existing electrical and electronic devices can be turned into more controllable and more functional devices using artificial intelligence. In every society, some spinal disabled people lack physical and motor abilities such as moving their limbs and they cannot use the normal wheelchair and need a wheelchair with voice control. The main challenge of this project is to identify the voice patterns of disabled people. Audio classification is one of the challenges in the field of pattern recognition. In this paper, a method of classifying ambient sounds based on the sound spectrogram, using deep neural networks is presented to classify Persian speakers sound for building a voice-controlled intelligent wheelchair. To do this, we used Inception-V3 as a convolutional neural network which is pretrained by the ImageNet dataset. In the next step, we trained the network with images that are generated using spectrogram images of the ambient sound of about 50 Persian speakers. The experimental results achieved a mean accuracy of 83.33%. In this plan, there is the ability to control the wheelchair by a third party (such as spouse, children or parents) by installing an application on their mobile phones. This wheelchair will be able to execute five commands such as stop, left, right, front and back.},
     year = {2021}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Designing a Voice-controlled Wheelchair for Persian-speaking Users Using Deep Learning Networks with a Small Dataset
    AU  - Mohammad Amiri
    AU  - Manizheh Ranjbar
    AU  - Mostafa Azami Gharetappeh
    Y1  - 2021/11/05
    PY  - 2021
    N1  - https://doi.org/10.11648/j.mlr.20210601.11
    DO  - 10.11648/j.mlr.20210601.11
    T2  - Machine Learning Research
    JF  - Machine Learning Research
    JO  - Machine Learning Research
    SP  - 1
    EP  - 7
    PB  - Science Publishing Group
    SN  - 2637-5680
    UR  - https://doi.org/10.11648/j.mlr.20210601.11
    AB  - With the advancement of technology, the demand for improving the quality of life of the elderly and disabled has increased and their hope to overcome their problem is realized by using advanced technologies in the field of rehabilitation. Many existing electrical and electronic devices can be turned into more controllable and more functional devices using artificial intelligence. In every society, some spinal disabled people lack physical and motor abilities such as moving their limbs and they cannot use the normal wheelchair and need a wheelchair with voice control. The main challenge of this project is to identify the voice patterns of disabled people. Audio classification is one of the challenges in the field of pattern recognition. In this paper, a method of classifying ambient sounds based on the sound spectrogram, using deep neural networks is presented to classify Persian speakers sound for building a voice-controlled intelligent wheelchair. To do this, we used Inception-V3 as a convolutional neural network which is pretrained by the ImageNet dataset. In the next step, we trained the network with images that are generated using spectrogram images of the ambient sound of about 50 Persian speakers. The experimental results achieved a mean accuracy of 83.33%. In this plan, there is the ability to control the wheelchair by a third party (such as spouse, children or parents) by installing an application on their mobile phones. This wheelchair will be able to execute five commands such as stop, left, right, front and back.
    VL  - 6
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • Department of Computer Engineering, Technical and Vocational University (TVU), Tehran, Iran

  • Department of Computer Engineering, Technical and Vocational University (TVU), Tehran, Iran

  • Department of Computer Engineering, Technical and Vocational University (TVU), Tehran, Iran

  • Sections