This contribution describes the second stage of the creation of a language training system programmed in Python with the aim of application to speech therapy in spanish-speaking countries, starting the study in Cuba. The first stage of this research was carried out in Matlab by analyzing the dynamics of change of the centroids of the codebooks, extracted from words pronounced by a locutor. As second stage, the Variational Coefficient formula is used in order to estimate the percentage of effectiveness with which the announcer performs voice training. A modified approach to programming the variational coefficient is taken into account as a measure of dispersion of a group of vectors. The modification is given by taking the mean of the group of vectors as the vector that represents the phonetic boundaries of the word to be trained. Besides, a novel approach for word recognition is used, based on the K-Nearest Training Matrix (KNTM) algorithm that lays its foundations in the analysis of matrix similarity taken the Frobenius norm as a measure to distinguish similar or non-similar characteristics of a matrix with respect to a database of matrices. To reduce the computational cost of the program and speed up its proper functioning, the training matrices of the database are saved in files with a .tex extension, in this way after training process, the program should only read them and not recalculate them, which significantly reduces the running time of the algorithm.
Published in | Mathematics and Computer Science (Volume 6, Issue 2) |
DOI | 10.11648/j.mcs.20210602.12 |
Page(s) | 38-44 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2021. Published by Science Publishing Group |
Mel Frequency Cepstral Coefficients, Vector Quantization, Variational Coefficient, Word Recognition
[1] | Alani, A., Deriche, M., (1999). A Novel Approach to Speech Segmentation Using the Wavelet Transform. Signal Processing and Its Applications, 1999. ISSPA'99. Proceedings of the Fifth International Symposium on Signal Processing and Its Applications. 1, 127-130. |
[2] | Banu, S., Cemanur, A., Gökhan, C., Sulayman, J., Tuba, Y., Mehmet, Ҫ., Bülent, Ö., Ibrahim, A., (2019). Microwave dielectric property based classification of renal calculi: Application of a KNN algorithm. Computers in Biology and Medicine, 112 (2019) 103366. |
[3] | Bhagyalaxmi, J., Anita, M., Subrat, KM. (2020). Gender Recognition of Speech Signal using KNN and SVM. International Conference on IoT based Control Networks and Intelligent Systems (ICICNIS 2020). Electronic copy available at: https://ssrn.com/abstract=3769786. |
[4] | Christophe, P., Christoffer, HH, Sigurd, E., Marlène, G., (2020). On the use of the coefficient of variation to quantify and compare trait variation. Evolution Letters, 4-3: 180-188. |
[5] | Davis, SB, Mermelstein, P. (1980). Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences, IEEE Trans. on Acoustic, Speech and Signal Processing, 28 (4): 357-366. |
[6] | De Mori, R., Laface, P. (1980). Use of Fuzzy Algorithms for Phonetic and Phonemic Labeling of Continuous Speech. Pattern Analysis and Machine Intelligence, IEEE Transactions on, PAMI-2 (2): 136-148. |
[7] | Fant, G. Speech Sounds and Features. The MIT Press, Cambridge, MA, USA, 1973. |
[8] | Finster, H. (1992). Automatic Speech Segmentation using Neural Network and Phonetic Transcription. Neural Networks, 1992. IJCNN, International Joint Conference on, 4 (4): 734-736. |
[9] | Gomez, JA, Castro, MJ. (2002). Automatic Segmentation of Speech at the Phonetic Level. En: Structural, Syntactic, and Statistical Pattern Recognition. Lecture Notes in Computer Science, 2396, 883-921. |
[10] | Grieder W., Kinsner W., Speech Segmentation by Variance Fractal Dimension, Department of Electrical and Computer Engineering and Telecommunications Research Laboratories, University of Manitoba, Winnipeg, Manitoba, Canada R3T 5V6. |
[11] | Hernandez-Mena, C., Herrera-Camacho, A. (2015). Creating a Grammar-Based Speech Recognition Parser for Mexican Spanish Using HTK, Compatible with CMU Sphinx-III System, International Journal of Electronics and Electrical Engineering, 3 (3): 220-224. |
[12] | Linde, Y., Buzo, A., Gray RM. (1980). An Algorithm for Vector Quantizer Design. IEEE TRANSACTIONS ON COMMUNICATIONS, COM-28 (1): 84-95. |
[13] | Milone, DH, Merelo, JJ, Rufiner, HL. (2002). Evolutionary Algorithm for Speech Segmentation. Evolutionary Computation, 2002. CEC'02. Proceedings of the 2002 Congress on, 2, 1115-1120. |
[14] | Moore, BCJ, Glasberg, BR. (1983). Suggested formulae for calculating auditory-filter bandwidths and excitation patterns, Journal of the Acoustical Society of America, 74 (3): 750-753. |
[15] | Proakis, JG, Manolakis DG., Digital Signal Processing. Principles, Algorithms and Applications, Third Edition, \copyright 1996 by Prentice-Hall, Inc. Simon \& Schuster/A Viacom Company Upper Saddle River, New Jersey 07458 All rights reserved, ISBN 0-13-394338-9. |
[16] | Sayood, K. (2012) Vector Quantization. Introduction to data compression (fourth edition) A volume in The Morgan Kaufmann Series in Multimedia Information and Systems, 295-344. |
[17] | Sergio Suarez Guerra, Jose Luis Oropeza Rodriguez (2020). Automatic Phonetic Labeling at Word Level Using the Dynamics of Changing Codebook Vectors, Computación y Sistemas, 24 (2): 855-868. |
[18] | Shangchun, L., Gongfa, L., Jiahan, L., Du, J., Guozhang, J., Ying, S., Bo, T., Haoyi, Z., Disi, C., (2020). Multi-object intergroup gesture recognition combined with fusion feature and KNN algorithm, Journal of Intelligent & Fuzzy Systems 38 (2020): 2725-2735. |
[19] | Varun, G., Monika, M., (2018). KNN and PCA classifier with Autoregressive modelling during different ECG signal interpretation, Procedia Computer Science 125 (2018): 18-24. |
[20] | Web Site https://github.com/mystlee/rasta_py/blob/master/ rasta.py. |
[21] | Spohrer, JC, Brown, PF, Roth, R. (1982) Automatic Labeling of Speech. Acoustics, Speech and Signal Processing, IEEE International Conference on ICASSP'82, 7, 1641-1644. |
APA Style
Leandro Daniel Lau Alfonso, Sergio Suarez Guerra, Jose Luis Oropeza Rodriguez, Roberto Rodriguez Morales, Gustavo Asumu Mboro Nchama. (2021). Python Language Training System Based on MFCC, VQ, Variational Coefficient and KNTM Algorithm. Mathematics and Computer Science, 6(2), 38-44. https://doi.org/10.11648/j.mcs.20210602.12
ACS Style
Leandro Daniel Lau Alfonso; Sergio Suarez Guerra; Jose Luis Oropeza Rodriguez; Roberto Rodriguez Morales; Gustavo Asumu Mboro Nchama. Python Language Training System Based on MFCC, VQ, Variational Coefficient and KNTM Algorithm. Math. Comput. Sci. 2021, 6(2), 38-44. doi: 10.11648/j.mcs.20210602.12
AMA Style
Leandro Daniel Lau Alfonso, Sergio Suarez Guerra, Jose Luis Oropeza Rodriguez, Roberto Rodriguez Morales, Gustavo Asumu Mboro Nchama. Python Language Training System Based on MFCC, VQ, Variational Coefficient and KNTM Algorithm. Math Comput Sci. 2021;6(2):38-44. doi: 10.11648/j.mcs.20210602.12
@article{10.11648/j.mcs.20210602.12, author = {Leandro Daniel Lau Alfonso and Sergio Suarez Guerra and Jose Luis Oropeza Rodriguez and Roberto Rodriguez Morales and Gustavo Asumu Mboro Nchama}, title = {Python Language Training System Based on MFCC, VQ, Variational Coefficient and KNTM Algorithm}, journal = {Mathematics and Computer Science}, volume = {6}, number = {2}, pages = {38-44}, doi = {10.11648/j.mcs.20210602.12}, url = {https://doi.org/10.11648/j.mcs.20210602.12}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.mcs.20210602.12}, abstract = {This contribution describes the second stage of the creation of a language training system programmed in Python with the aim of application to speech therapy in spanish-speaking countries, starting the study in Cuba. The first stage of this research was carried out in Matlab by analyzing the dynamics of change of the centroids of the codebooks, extracted from words pronounced by a locutor. As second stage, the Variational Coefficient formula is used in order to estimate the percentage of effectiveness with which the announcer performs voice training. A modified approach to programming the variational coefficient is taken into account as a measure of dispersion of a group of vectors. The modification is given by taking the mean of the group of vectors as the vector that represents the phonetic boundaries of the word to be trained. Besides, a novel approach for word recognition is used, based on the K-Nearest Training Matrix (KNTM) algorithm that lays its foundations in the analysis of matrix similarity taken the Frobenius norm as a measure to distinguish similar or non-similar characteristics of a matrix with respect to a database of matrices. To reduce the computational cost of the program and speed up its proper functioning, the training matrices of the database are saved in files with a .tex extension, in this way after training process, the program should only read them and not recalculate them, which significantly reduces the running time of the algorithm.}, year = {2021} }
TY - JOUR T1 - Python Language Training System Based on MFCC, VQ, Variational Coefficient and KNTM Algorithm AU - Leandro Daniel Lau Alfonso AU - Sergio Suarez Guerra AU - Jose Luis Oropeza Rodriguez AU - Roberto Rodriguez Morales AU - Gustavo Asumu Mboro Nchama Y1 - 2021/05/14 PY - 2021 N1 - https://doi.org/10.11648/j.mcs.20210602.12 DO - 10.11648/j.mcs.20210602.12 T2 - Mathematics and Computer Science JF - Mathematics and Computer Science JO - Mathematics and Computer Science SP - 38 EP - 44 PB - Science Publishing Group SN - 2575-6028 UR - https://doi.org/10.11648/j.mcs.20210602.12 AB - This contribution describes the second stage of the creation of a language training system programmed in Python with the aim of application to speech therapy in spanish-speaking countries, starting the study in Cuba. The first stage of this research was carried out in Matlab by analyzing the dynamics of change of the centroids of the codebooks, extracted from words pronounced by a locutor. As second stage, the Variational Coefficient formula is used in order to estimate the percentage of effectiveness with which the announcer performs voice training. A modified approach to programming the variational coefficient is taken into account as a measure of dispersion of a group of vectors. The modification is given by taking the mean of the group of vectors as the vector that represents the phonetic boundaries of the word to be trained. Besides, a novel approach for word recognition is used, based on the K-Nearest Training Matrix (KNTM) algorithm that lays its foundations in the analysis of matrix similarity taken the Frobenius norm as a measure to distinguish similar or non-similar characteristics of a matrix with respect to a database of matrices. To reduce the computational cost of the program and speed up its proper functioning, the training matrices of the database are saved in files with a .tex extension, in this way after training process, the program should only read them and not recalculate them, which significantly reduces the running time of the algorithm. VL - 6 IS - 2 ER -