UZBEK COMMANDS RECOGNITION BY PROCESSING THE SPECTROGRAM IMAGE

Musayev M M; Khujayorov I Sh; Abdullaeva M I; Ochilov M M

370

This paper describes the most common algorithms with image approach
convolutional neural network and two-dimensional DCT with machine learning classification
KNN, SVM and RF. These algorithms are evaluated for applicability to the Uzbek language
and a comparative analysis on the accuracy and recognition rate. The command words of the
Uzbek language were chosen for the experiments. According to the results, it was found that
both methods give high rates of recognition accuracy and are 92% (CNN) and 90% (2DDCT+
Zigzag+SVM). Also the combinations of 2D-DCT+Zigzag+ KNN and 2D-DCT+Zigzag+ RF
with average recognition accuracy of 86% and 85%, respectively, were considered in the
paper.

Name of journalTechnical science and innovation
Number of edition2022, №2(12)
View count 370

Web Address

DOI

Date of creation in the UzSCI system 07-09-2022

Read count 370

Date of publication 30-08-2022

Main LanguageIngliz

Pages55-61

Tags

Spectrogram image

feature extraction

speech classification

English

This paper describes the most common algorithms with image approach
convolutional neural network and two-dimensional DCT with machine learning classification
KNN, SVM and RF. These algorithms are evaluated for applicability to the Uzbek language
and a comparative analysis on the accuracy and recognition rate. The command words of the
Uzbek language were chosen for the experiments. According to the results, it was found that
both methods give high rates of recognition accuracy and are 92% (CNN) and 90% (2DDCT+
Zigzag+SVM). Also the combinations of 2D-DCT+Zigzag+ KNN and 2D-DCT+Zigzag+ RF
with average recognition accuracy of 86% and 85%, respectively, were considered in the
paper.

Tags

Spectrogram image

feature extraction

speech classification

№ Author name position Name of organisation

1 Musayev M.M. teacher TUIT

2 Khujayorov I.S. teacher Samarkand branch of TUIT

3 Abdullaeva M.I. teacher TUIT

4 Ochilov M.M. teacher TUIT

№ Name of reference

1 P. Ibrahim., Y.R Srinivas. Speech recognition using HMM with MFCC-an analysis using frequency Spectral decomposing technique. “Signal Image Processing an International Journal (SIPIJ)”, 2010

2 A.M. Badshah., J. Ahmad., N. Rahim., S.W. Baik. Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network. 2017 “International conference on platform technology and service”, 2017

3 M. Gales., S. Young. The application of hidden Markov models in speech recognition. “Foundations and Trends in Signal Processing”, 2007. 195

4 N.G. Andrew., Y. Zhang. “Speech recognition using deep learning algorithms. published”, 2013

5 M.B. Gulmezoglu. A novel approach to isolated word recognition. “IEEE transactions on speech and audio processing”, 1999. 620

6 N.E. Sukmawati., A. Satriyo., R.A. Sutikno. Automatic Speech Recognition for Indonesian using Linear Predictive Coding (LPC) and Hidden Markov Model (HMM). “Proceeding of 5th international seminar on new paradigm and innovation on natural science and its application”, 2015

7 M. Ahmadi., N.J Bailey., B.S. Hoyle. Phoneme recognition using speech image (spectrogram). “Published in IEEE proceedings of third international conference on signal processing”, 1996

8 J. Zhang., S. Xiao., H. Zhang., L. Jiang. Isolated word recognition with audio derivation and CNN. “Proceedings international conference on tools with artificial intelligence”, 2018. 336

9 D. Polap., M. Woźniak. “Image approach to voice recognition. 2017 IEEE symposium series on computational intelligence”, 2018. 1

10 J.M. Padmanabhan., J.J. Premkumar. Machine learning in automatic speech recognition. “A survey. IETE Technical review institution of electronics and telecommunication engineers”, 2015. 240

11 C. Glackin., J. Wall., G. Chollet., N. Dugan., N. Cannings. Convolutional neural networks for phoneme recognition. “Proceedings of the 7th international conference on pattern recognition applications and methods”, 2018. 190

12 L. Yingying., P. Siyuan., X. Nanfeng. Speech Recognition Method Based on Spectrogram. “Proceedings of the international conference on mechatronics and intelligent robotics (ICMIR)”, 2017

13 A.H. Waibel., T. Hanazawa., G. Hinton., K. Shikano., K. Lang. “Phoneme recognition using time-delay neural networks”, 1989

14 W. Fisher., M. Doddington., R. George., M. Goudie., M. Kathleen M. “The DARPA Speech recognition research database: specifications and status”, 1986. 93

15 Q.T. Nguyen. Speech classification using sift features on spectrogram images. “Vietnam journal of computer science”, 2016. 247

16 M. Al-Darkazali. “Image processing methods to segment speech spectrograms for word level recognition”, 2017

17 E. Geoffrey., N.S. Hinton., A. Krizhevskiy., I.R Sutskever., R. Salakhutdinov. “Dropout a simple way to prevent neural networks from overfitting. journal of machine learning research”, 2014. 1929

18 A. Rosebrock. “Deep learning for computer vision with python starter bundle”, 2017

19 S. Rekik., D. Guerchi., S.A. Selouani. “Speech steganography using wavelet and fourier transforms”, 2012

Waiting

№	Author name	position	Name of organisation
1	Musayev M.M.	teacher	TUIT
2	Khujayorov I.S.	teacher	Samarkand branch of TUIT
3	Abdullaeva M.I.	teacher	TUIT
4	Ochilov M.M.	teacher	TUIT

№	Name of reference
1	P. Ibrahim., Y.R Srinivas. Speech recognition using HMM with MFCC-an analysis using frequency Spectral decomposing technique. “Signal Image Processing an International Journal (SIPIJ)”, 2010
2	A.M. Badshah., J. Ahmad., N. Rahim., S.W. Baik. Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network. 2017 “International conference on platform technology and service”, 2017
3	M. Gales., S. Young. The application of hidden Markov models in speech recognition. “Foundations and Trends in Signal Processing”, 2007. 195
4	N.G. Andrew., Y. Zhang. “Speech recognition using deep learning algorithms. published”, 2013
5	M.B. Gulmezoglu. A novel approach to isolated word recognition. “IEEE transactions on speech and audio processing”, 1999. 620
6	N.E. Sukmawati., A. Satriyo., R.A. Sutikno. Automatic Speech Recognition for Indonesian using Linear Predictive Coding (LPC) and Hidden Markov Model (HMM). “Proceeding of 5th international seminar on new paradigm and innovation on natural science and its application”, 2015
7	M. Ahmadi., N.J Bailey., B.S. Hoyle. Phoneme recognition using speech image (spectrogram). “Published in IEEE proceedings of third international conference on signal processing”, 1996
8	J. Zhang., S. Xiao., H. Zhang., L. Jiang. Isolated word recognition with audio derivation and CNN. “Proceedings international conference on tools with artificial intelligence”, 2018. 336
9	D. Polap., M. Woźniak. “Image approach to voice recognition. 2017 IEEE symposium series on computational intelligence”, 2018. 1
10	J.M. Padmanabhan., J.J. Premkumar. Machine learning in automatic speech recognition. “A survey. IETE Technical review institution of electronics and telecommunication engineers”, 2015. 240
11	C. Glackin., J. Wall., G. Chollet., N. Dugan., N. Cannings. Convolutional neural networks for phoneme recognition. “Proceedings of the 7th international conference on pattern recognition applications and methods”, 2018. 190
12	L. Yingying., P. Siyuan., X. Nanfeng. Speech Recognition Method Based on Spectrogram. “Proceedings of the international conference on mechatronics and intelligent robotics (ICMIR)”, 2017
13	A.H. Waibel., T. Hanazawa., G. Hinton., K. Shikano., K. Lang. “Phoneme recognition using time-delay neural networks”, 1989
14	W. Fisher., M. Doddington., R. George., M. Goudie., M. Kathleen M. “The DARPA Speech recognition research database: specifications and status”, 1986. 93
15	Q.T. Nguyen. Speech classification using sift features on spectrogram images. “Vietnam journal of computer science”, 2016. 247
16	M. Al-Darkazali. “Image processing methods to segment speech spectrograms for word level recognition”, 2017
17	E. Geoffrey., N.S. Hinton., A. Krizhevskiy., I.R Sutskever., R. Salakhutdinov. “Dropout a simple way to prevent neural networks from overfitting. journal of machine learning research”, 2014. 1929
18	A. Rosebrock. “Deep learning for computer vision with python starter bundle”, 2017
19	S. Rekik., D. Guerchi., S.A. Selouani. “Speech steganography using wavelet and fourier transforms”, 2012