| Peer-Reviewed

On the Synthesis of Some Artificial Sounds and Words of Human Speech

Received: 30 June 2021     Accepted: 15 September 2021     Published: 25 November 2021
Views:       Downloads:
Abstract

The paper describes the results of numerical experiments on the decomposition of some sounds and words of a person's speech into separate waves with slowly drifting amplitudes, frequencies, phases and their reverse summation in order to identify factors that are both important and not important for automatic speech recognition. The objective of this study is investigation the mathematical features of various sounds and words of human speech without using the method of Fourier transforms. Instead of Fourier transforms, the approximation method developed earlier by the author is used. This method allow expand of periodic or almost periodic functions to sum of modes with slowly varying (drifting) parameters - amplitudes, frequencies, phases. Such decompositions were carried out for samples of vowel sounds, simple syllables and words. After that, the reverse summation of the drifting modes was carried out. Before summation the modes, their parameters were deliberately distorted in order to identify factors, both significant and insignificant for the essence of sounds. The functions obtained in this way are of the nature of artificial sound functions It turned out, that for vowel sounds amplitudes of modes may be averaged over long time without lost the essence of sounds. The phases of sounds may be changed by adding any random constant value without lost their essence too. It has been found that In many cases, for to find the parameters, it is convenient use not the sound function itself, but its time derivative. It was shown, that amplitude of summing modes of sound function may be represent as sum of several Gaussian function as for simple sounds, as for syllables. The appropriate mathematical formulas and tables of parameters of artificial sound functions presented

Published in Science, Technology & Public Policy (Volume 5, Issue 2)
DOI 10.11648/j.stpp.20210502.16
Page(s) 115-123
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2021. Published by Science Publishing Group

Keywords

Speech and Person Recognition Throw Voise, Speech Technologies, Data Processind, Fourier Transform, Transform Voice, Expand the Quasiperiodical Sygnals into Base Frequencies

References
[1] Auni Hannum. Speech Recognition is not Solved. 2017, Posted on October 11, 2017.
[2] 1. Sorokin, V. N. Speech recognition based on spectral-temporal irregularities in the speech signal. Acoustic magazine, 2020, V66, N1, 71-85.
[3] Yakovlev, A. V., Sosnin, V. A. Digital processing of acoustic pulses in the acoustic emission diagnostics system KAEMS, 2018, N3, http:/ejta.org, 2018.
[4] Vasilieva, L. G., Zhileikin, Ya. M., Osipik Yu. I. Fourier transforms and wavelet transforms. Their properties and applications. // Computational methods and programming: in 3 volumes - M., - V 3, - Issue 1, - P 172-175, 2002.
[5] Maksimchuk, I. V., Gergel, L. G., Osadchiy, O. V. Comparative analysis of Fourier and wavelet transform for the analysis of the photoplethysmogram signal. [Electronic resource] // Modern scientific research and innovation - M., 2013. - No. 6.
[6] Mitsianok, V. V. Determination of the numerical characteristics of high-frequency speech sounds based on approximation by harmonic functions // Bulletin of the National Academy of Sciences of Belarus, ser. f.-m.n., - Minsk, - No. 2, P. 111-118. 2009.
[7] Mitsianok, V. V. On the physical structure of simple vowel sounds of human speech // Open semantic technologies for the design of intelligent systems: materials of the VI international scientific and technical conference OSTIS-2016, Minsk, February 18-20, 2016, -Minsk: BSUIR, 2016, p. 404-410.
[8] Lobanov, B. M., Galunov, B. I., Zagoruiko, N. G, Ontology of the Subject domain “Speech Signal Recognition and Synthesis” // Procttding of the international conference “Speech end Computer” St.-Petersburg, 2004. – 440-444.
[9] Lobanov, B. M., Solomennik, A. I., Zhitko, V. A. An experience of an objective assessment of the intonation quality of synthesized Russian speech. Computational linguistics and intelligent technologies. Based on the materials of the conference "Dialogue" Moscow 2018, issue 17 (24). Publishing house of the Russian State University for the Humanities.
[10] Rusak, V. P., Getsevich, Yu. S., Mandric, V. A. Problems of norms, culture of language and speech generation. Collection of papers and abstracts of the 8th conference "Traditions and the current state of culture and arts. Minsk, Belarus. Minsk, Law and Economics, 2018, 748-752SPb., - 2013. No. 4.- Access mode: http://www.ejta.org, free.
[11] Sorokin, V. N., Viyugin, V. V. Tananikin, A. A. Personality recognition by voice: an analytical review. Information Processes, 2012. -t 12 - N. 1-30.
[12] Mitsianok, V. V. On the problem of identification and verification of personality by phase characteristics of speech sounds [Electronic resource] // Technical acoustics. - Electron. magazine - SPb., - 2015.- No. 7.- Access mode: http://www.ejta.org, free.
[13] Mitsianok, V. V. Generation of artificial sounds and words of human speech. Thesis of Annual Scientific Conference of Polessian Univ. 2021. Polessian Univ. Edition.
[14] Mitsianok, V. V. On the synthesis of artificial sounds of human speech sounds [Electronic resource] // Technical acoustics. - Electron. magazine - SPb., - 2017.- No. 1.- Access mode: http://www.ejta.org, free.
[15] Mitsianok, V. V., Konovalova, N. V. Application of phase analysis of speech sounds for recognizing a person by his voice. [Electronic resource] // Technical acoustics. - Electron. magazine - SPb., - 2013. No. 4.- Access mode: http://www.ejta.org, free.
[16] Mitsianok, V. V. On the physical structure of sounds Z, Zb, ZH, ZHb. [Electronic resource] // Technical acoustics. - Electron. magazine - SPb., - 2014.- No. 9.- Access mode: http://www.ejta.org, free.
Cite This Article
  • APA Style

    Viachaslau Vladimirovich Mitsianok. (2021). On the Synthesis of Some Artificial Sounds and Words of Human Speech. Science, Technology & Public Policy, 5(2), 115-123. https://doi.org/10.11648/j.stpp.20210502.16

    Copy | Download

    ACS Style

    Viachaslau Vladimirovich Mitsianok. On the Synthesis of Some Artificial Sounds and Words of Human Speech. Sci. Technol. Public Policy 2021, 5(2), 115-123. doi: 10.11648/j.stpp.20210502.16

    Copy | Download

    AMA Style

    Viachaslau Vladimirovich Mitsianok. On the Synthesis of Some Artificial Sounds and Words of Human Speech. Sci Technol Public Policy. 2021;5(2):115-123. doi: 10.11648/j.stpp.20210502.16

    Copy | Download

  • @article{10.11648/j.stpp.20210502.16,
      author = {Viachaslau Vladimirovich Mitsianok},
      title = {On the Synthesis of Some Artificial Sounds and Words of Human Speech},
      journal = {Science, Technology & Public Policy},
      volume = {5},
      number = {2},
      pages = {115-123},
      doi = {10.11648/j.stpp.20210502.16},
      url = {https://doi.org/10.11648/j.stpp.20210502.16},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.stpp.20210502.16},
      abstract = {The paper describes the results of numerical experiments on the decomposition of some sounds and words of a person's speech into separate waves with slowly drifting amplitudes, frequencies, phases and their reverse summation in order to identify factors that are both important and not important for automatic speech recognition. The objective of this study is investigation the mathematical features of various sounds and words of human speech without using the method of Fourier transforms. Instead of Fourier transforms, the approximation method developed earlier by the author is used. This method allow expand of periodic or almost periodic functions to sum of modes with slowly varying (drifting) parameters - amplitudes, frequencies, phases. Such decompositions were carried out for samples of vowel sounds, simple syllables and words. After that, the reverse summation of the drifting modes was carried out. Before summation the modes, their parameters were deliberately distorted in order to identify factors, both significant and insignificant for the essence of sounds. The functions obtained in this way are of the nature of artificial sound functions It turned out, that for vowel sounds amplitudes of modes may be averaged over long time without lost the essence of sounds. The phases of sounds may be changed by adding any random constant value without lost their essence too. It has been found that In many cases, for to find the parameters, it is convenient use not the sound function itself, but its time derivative. It was shown, that amplitude of summing modes of sound function may be represent as sum of several Gaussian function as for simple sounds, as for syllables. The appropriate mathematical formulas and tables of parameters of artificial sound functions presented},
     year = {2021}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - On the Synthesis of Some Artificial Sounds and Words of Human Speech
    AU  - Viachaslau Vladimirovich Mitsianok
    Y1  - 2021/11/25
    PY  - 2021
    N1  - https://doi.org/10.11648/j.stpp.20210502.16
    DO  - 10.11648/j.stpp.20210502.16
    T2  - Science, Technology & Public Policy
    JF  - Science, Technology & Public Policy
    JO  - Science, Technology & Public Policy
    SP  - 115
    EP  - 123
    PB  - Science Publishing Group
    SN  - 2640-4621
    UR  - https://doi.org/10.11648/j.stpp.20210502.16
    AB  - The paper describes the results of numerical experiments on the decomposition of some sounds and words of a person's speech into separate waves with slowly drifting amplitudes, frequencies, phases and their reverse summation in order to identify factors that are both important and not important for automatic speech recognition. The objective of this study is investigation the mathematical features of various sounds and words of human speech without using the method of Fourier transforms. Instead of Fourier transforms, the approximation method developed earlier by the author is used. This method allow expand of periodic or almost periodic functions to sum of modes with slowly varying (drifting) parameters - amplitudes, frequencies, phases. Such decompositions were carried out for samples of vowel sounds, simple syllables and words. After that, the reverse summation of the drifting modes was carried out. Before summation the modes, their parameters were deliberately distorted in order to identify factors, both significant and insignificant for the essence of sounds. The functions obtained in this way are of the nature of artificial sound functions It turned out, that for vowel sounds amplitudes of modes may be averaged over long time without lost the essence of sounds. The phases of sounds may be changed by adding any random constant value without lost their essence too. It has been found that In many cases, for to find the parameters, it is convenient use not the sound function itself, but its time derivative. It was shown, that amplitude of summing modes of sound function may be represent as sum of several Gaussian function as for simple sounds, as for syllables. The appropriate mathematical formulas and tables of parameters of artificial sound functions presented
    VL  - 5
    IS  - 2
    ER  - 

    Copy | Download

Author Information
  • Department of Engineering, Palessie State University, Pinsk, Belarus

  • Sections