SO‘Z MA’NOSINI ANIQLASHDA NAIVE BAYES ALGORITMIDAN FOYDALANISH

Elov Botir Boltayevich; Axmedova Xolisxon Ilxomovna

doi:https://dx.doi.org/10.36522/2181-9637-2023-3-4

707

Tabiiy tilni qayta ishlash jarayonlarining dolzarb masalalaridan biri – bu so‘z ma’nosini aniqlashdir. So‘z ma’nosini aniqlash masalasining muhim elementi sifatida omonim so‘zlar qaraladi. Bu masalani yeсhishda mashinali o‘qitishga asoslangan usullar alohida o‘rin tutadi. Naive Bayes klassifikatori ana shunday usullardan biridir. O‘zbek tilidagi turli va grammatik jihatdan o‘xshash bo‘lgan so‘z turkumlari orasidagi omonimiyani bartaraf etishda Naive Bayes klassifikatori soddaligi va tezkorligi bilan boshqa usullardan ajralib turadi. Mazkur klassifi kator ko‘p sinfli tasniflashning eng mashhur algoritmlaridan biri bo‘lib, ko‘rib chiqilayotgan ma’lumotlarga qarab, Naive Bayes algoritmlarining 3 turi (Gauss, Multinominal, Bernoulli)ning istalganidan foydalanish mumkin. Ushbu maqolada o‘zbek tilining grammatik jihatdan o‘xshash bo‘lgan so‘z turkumlari orasidagi omonimiyani aniqlashda klassifi katordan foydalanish jarayonlari batafsil yoritib berilgan.

Название журналаИЛМ-ФАН ВА ИННОВАЦИОН РИВОЖЛАНИШ
Номер выпускаИлм-фан ва инновацион ривожланиш илмий журнали 2023 йил 3-сон
Количество просмотров 707

Ссылка в интернете https://ilm.mininnovation.uz/index.php/journal/article/view/416

DOIhttps://dx.doi.org/10.36522/2181-9637-2023-3-4

Дата создание в систему UzSCI 22-08-2023

Количество прочтений 623

Дата публикации 22-06-2023

Язык статьиO'zbek

Страницы44-54

Ключевые слова

tabiiy tilni qayta ishlash jarayonlari

so‘z ma’nosini aniqlash

omonimiya

Naive Bayes klassifikatori

matnlarni tasniflash

aprior va aposterior ehtimolliklar

Scikit learn kutubxonasi

Ўзбек

Tabiiy tilni qayta ishlash jarayonlarining dolzarb masalalaridan biri – bu so‘z ma’nosini aniqlashdir. So‘z ma’nosini aniqlash masalasining muhim elementi sifatida omonim so‘zlar qaraladi. Bu masalani yeсhishda mashinali o‘qitishga asoslangan usullar alohida o‘rin tutadi. Naive Bayes klassifikatori ana shunday usullardan biridir. O‘zbek tilidagi turli va grammatik jihatdan o‘xshash bo‘lgan so‘z turkumlari orasidagi omonimiyani bartaraf etishda Naive Bayes klassifikatori soddaligi va tezkorligi bilan boshqa usullardan ajralib turadi. Mazkur klassifi kator ko‘p sinfli tasniflashning eng mashhur algoritmlaridan biri bo‘lib, ko‘rib chiqilayotgan ma’lumotlarga qarab, Naive Bayes algoritmlarining 3 turi (Gauss, Multinominal, Bernoulli)ning istalganidan foydalanish mumkin. Ushbu maqolada o‘zbek tilining grammatik jihatdan o‘xshash bo‘lgan so‘z turkumlari orasidagi omonimiyani aniqlashda klassifi katordan foydalanish jarayonlari batafsil yoritib berilgan.

Ключевые слова

tabiiy tilni qayta ishlash jarayonlari

so‘z ma’nosini aniqlash

omonimiya

Naive Bayes klassifikatori

matnlarni tasniflash

aprior va aposterior ehtimolliklar

Scikit learn kutubxonasi

Русский

Одним из актуальных вопросов обработки естественного языка является определение значения слова. Омонимы рассматриваются как важный элемент определения значения слова. Особую роль в решении этой задачи играют методы, основанные на машинном обучении. Наивный байесовский классификатор – один из важных методов машинного обучения. При устранении омонимии между разными и грамматически сходными группами слов в узбекском языке наивный байесовский классификатор отличается от других методов своей простотой и скоростью. Этот классификатор является одним из самых популярных многоклассовых алгоритмов классификации, и в зависимости от рассматриваемых данных может использоваться любой из 3-х типов наивных байесовских алгоритмов (гауссовский, полиномиальный, бернуллиевский). В данной статье подробно описаны процессы использования классификатора для выявления омонимии между грамматически сходными группами слов узбекского языка.

Ключевые слова

омонимия

смысл слова

обработка естественного языка

наивный байесовский классификатор

классификация текстов

априорные и апостериорные вероятности

обучающая библиотека Scikit

English

One of the relevant issues of a natural language processing is word sense disambiguation. Homonyms are considered as an important element of determining the meaning of a word. Methods based on machine learning play a special role in solving this problem. Naive Bayes classifier is one of the important machine learning methods. When eliminating homonymy between different and grammatically similar groups of words in the Uzbek language, the Naive Bayes classifier differs from other methods in its simplicity and speed. This classifier is one of the most popular multi-class classification algorithms, and depending on the data in question, any of the 3 types of Naive Bayes algorithms (Gaussian, Polynomial, Bernoulli) can be used. This article scrutinizes the processes of using the classifier to identify homonymy between grammatically similar groups of words in the Uzbek language.

Ключевые слова

homonymy

Natural language processing

Word sense disambiguation

Naive Bayes classifier

text classification

prior and posterior probabilities

Scikit learning library

№ Имя автора Должность Наименование организации

1 Elov B.B. texnika fanlari bo‘yicha falsafa doktori (PhD), dotsent Alisher Navoiy nomidagi Toshkent davlat o‘zbek tili va adabiyoti universiteti

2 Axmedova X.I. tayanch doktorant Alisher Navoiy nomidagi Toshkent davlat o‘zbek tili va adabiyoti universiteti

№ Название ссылки

1 Anggraeni, M., Syafrullah, M., & Damanik, H. (2019). Literation Hearing Impairment (I-Chat Bot): Natural Language Processing (NLP) and Naïve Bayes Method. Journal of Physics: Conference Series, 1201, 1-7. doi:10.1088/1742-6596/1201/1/012057

2 Axmedova, X. (2022). Chastotali usul yordamida omonimiyani aniqlash [Determining homonymy using frequency method]. Proceedings of the Republican scientific and practical conference “Prospects of applied uzbek philology”, (pp. 164-170). Tashkent.

3 Axmedova, X. (2022). Turli so‘z turkumlari orasidagi omonimiyani aniqlovchi matematik modellar [Mathematical models for identifying homonymy between different word groups]. Science and Innovation, 1(7), 393-400. doi:10.5281/zenodo.7238546

4 Bahri, S., Saputra, R., & Wajhillah, R. (2017). Sentiment analysis based on Natural Language (NLP) with Naïve-Bayes classifier. Proceedings of the National Conference on Social Science & Technology, 1 (1), pp. 176-180. Получено из https://www.researchgate.net

5 Bako, A., Taylor, H., Wiley, K., Zheng, J., Walter-McCabe, H., Kasthurirathne, S., & Vest, J. (2021). Using natural language processing to classify social work interventions. American Journal of Managed Care, 27(1), E24–E31. doi:10.37765/AJMC.2021.88580

6 Bogery, R., Babtain, N., Aslam, N., Alkabour, N., Hashim, Y., & Khan, I. (2019). Automatic semantic categorization of news headlines using ensemble machine learning: A comparative study. International Journal of Advanced Computer Science and Applications, 10(11), 689–696. doi:10.14569/ IJACSA.2019.0101190

7 Chifu, A., & Ionescu, R. (2012). Word sense disambiguation to improve precision for ambiguous queries. Open Computer Science, 2(4), 398-411. doi:10.2478/s13537-012-0032-6

8 Elov, B., & Axmedova, X. (2022). Determining homonymy using statistical methods. Proceedings of the Second Uzbekistan-Malaysia International Conference “Computational Models and Technologies (HMT 2022)”, (pp. 106). Tashkent.

9 Elov, B., & Axmedova, X. (2022). Uchta so‘z turkumi doirasidagi omonimiyani farqlovchi biznes jarayonni modellashtirish [Modeling a business process that differentiates homonymy within three sets of words]. Science and Innovative Development(1), 150-162. doi:10.36522/2181- 9637-2022-1-15

10 Foster, J., & Wagner, J. (2021). Naive Bayes versus BERT: Jupyter notebook assignments for an introductory NLP course. Proceedings of the 5th Workshop on Teaching Natural Language Processing (pp. 112-114). Association for Computational Linguistics (ACL). doi:10.18653/v1/2021. teachingnlp-1.20

11 Granik, M., & Mesyura, V. (2017). Fake news detection using naive Bayes classifier. Proceedings of the 2017 IEEE 1st Ukraine Conference on Electrical and Computer Engineering (UKRCON 2017) (pp. 900-903). Institute of Electrical and Electronics Engineers Inc. doi:10.1109/ UKRCON.2017.8100379

12 Kaur, C. (2020). Sentiment Analysis of Tweets on Social Issues using Machine Learning Approach. International Journal of Advanced Trends in Computer Science and Engineering, 9(4), 6303– 6311. doi:10.30534/ijatcse/2020/310942020

13 Ku, C., & Leroy, G. (2014). A decision support system: Automated crime report analysis and classification for e-government. Government Information Quarterly, 31(4), 534-544. doi:10.1016/j. giq.2014.08.003

14 Kunal, S., Saha, A., Varma, A., & Tiwari, V. (2018). Textual Dissection Of Live Twitter Reviews Using Naive Bayes. Proceedings of the International Conference on Computational Intelligence and Data Science (ICCIDS 2018), 132, pp. 307-313.

15 Nahar, K., Jaradat, A., Atoum, M., & Ibrahim, F. (2020). Sentiment analysis and classification of arab jordanian facebook comments for jordanian telecom companies using lexicon-based approach and machine learning. Jordanian Journal of Computers and Information Technology, 6(3), 247-262. doi:10.5455/jjcit.71-1586289399

16 Pal, A., Saha, D., Naskar, S., & Dash, N. (2021). In search of a suitable method for disambiguation of word senses in Bengali. International Journal of Speech Technology, 24(2), 439-454. doi:10.1007/ s10772-020-09787-8

17 Putong, M., & Suharjito. (2020). Classification model of contact center customers emails using machine learning. Advances in Science, Technology and Engineering Systems, 5(1), 174-182. doi:10.25046/aj050123

18 Rusli, N., Amir, A., Zahri, N., & Ahmad, R. (2019). Snake species identification by using natural language processing. Indonesian Journal of Electrical Engineering and Computer Science, 13(3), 999–1006. doi:10.11591/ijeecs.v13.i3.pp999-1006

19 Siddiqui, S., Rehman, M., Daudpota, S., & Waqas, A. (2019). Opinion mining: An approach to feature engineering. International Journal of Advanced Computer Science and Applications, 10(3), 159–165. doi:10.14569/IJACSA.2019.0100320

20 Taheri, S., & Mammadov, M. (2013). Learning the naive bayes classifier with optimization models. International Journal of Applied Mathematics and Computer Science, 23(4), 787-795. doi:10.2478/amcs-2013-0059

В ожидании

№	Имя автора	Должность	Наименование организации
1	Elov B.B.	texnika fanlari bo‘yicha falsafa doktori (PhD), dotsent	Alisher Navoiy nomidagi Toshkent davlat o‘zbek tili va adabiyoti universiteti
2	Axmedova X.I.	tayanch doktorant	Alisher Navoiy nomidagi Toshkent davlat o‘zbek tili va adabiyoti universiteti

№	Название ссылки
1	Anggraeni, M., Syafrullah, M., & Damanik, H. (2019). Literation Hearing Impairment (I-Chat Bot): Natural Language Processing (NLP) and Naïve Bayes Method. Journal of Physics: Conference Series, 1201, 1-7. doi:10.1088/1742-6596/1201/1/012057
2	Axmedova, X. (2022). Chastotali usul yordamida omonimiyani aniqlash [Determining homonymy using frequency method]. Proceedings of the Republican scientific and practical conference “Prospects of applied uzbek philology”, (pp. 164-170). Tashkent.
3	Axmedova, X. (2022). Turli so‘z turkumlari orasidagi omonimiyani aniqlovchi matematik modellar [Mathematical models for identifying homonymy between different word groups]. Science and Innovation, 1(7), 393-400. doi:10.5281/zenodo.7238546
4	Bahri, S., Saputra, R., & Wajhillah, R. (2017). Sentiment analysis based on Natural Language (NLP) with Naïve-Bayes classifier. Proceedings of the National Conference on Social Science & Technology, 1 (1), pp. 176-180. Получено из https://www.researchgate.net
5	Bako, A., Taylor, H., Wiley, K., Zheng, J., Walter-McCabe, H., Kasthurirathne, S., & Vest, J. (2021). Using natural language processing to classify social work interventions. American Journal of Managed Care, 27(1), E24–E31. doi:10.37765/AJMC.2021.88580
6	Bogery, R., Babtain, N., Aslam, N., Alkabour, N., Hashim, Y., & Khan, I. (2019). Automatic semantic categorization of news headlines using ensemble machine learning: A comparative study. International Journal of Advanced Computer Science and Applications, 10(11), 689–696. doi:10.14569/ IJACSA.2019.0101190
7	Chifu, A., & Ionescu, R. (2012). Word sense disambiguation to improve precision for ambiguous queries. Open Computer Science, 2(4), 398-411. doi:10.2478/s13537-012-0032-6
8	Elov, B., & Axmedova, X. (2022). Determining homonymy using statistical methods. Proceedings of the Second Uzbekistan-Malaysia International Conference “Computational Models and Technologies (HMT 2022)”, (pp. 106). Tashkent.
9	Elov, B., & Axmedova, X. (2022). Uchta so‘z turkumi doirasidagi omonimiyani farqlovchi biznes jarayonni modellashtirish [Modeling a business process that differentiates homonymy within three sets of words]. Science and Innovative Development(1), 150-162. doi:10.36522/2181- 9637-2022-1-15
10	Foster, J., & Wagner, J. (2021). Naive Bayes versus BERT: Jupyter notebook assignments for an introductory NLP course. Proceedings of the 5th Workshop on Teaching Natural Language Processing (pp. 112-114). Association for Computational Linguistics (ACL). doi:10.18653/v1/2021. teachingnlp-1.20
11	Granik, M., & Mesyura, V. (2017). Fake news detection using naive Bayes classifier. Proceedings of the 2017 IEEE 1st Ukraine Conference on Electrical and Computer Engineering (UKRCON 2017) (pp. 900-903). Institute of Electrical and Electronics Engineers Inc. doi:10.1109/ UKRCON.2017.8100379
12	Kaur, C. (2020). Sentiment Analysis of Tweets on Social Issues using Machine Learning Approach. International Journal of Advanced Trends in Computer Science and Engineering, 9(4), 6303– 6311. doi:10.30534/ijatcse/2020/310942020
13	Ku, C., & Leroy, G. (2014). A decision support system: Automated crime report analysis and classification for e-government. Government Information Quarterly, 31(4), 534-544. doi:10.1016/j. giq.2014.08.003
14	Kunal, S., Saha, A., Varma, A., & Tiwari, V. (2018). Textual Dissection Of Live Twitter Reviews Using Naive Bayes. Proceedings of the International Conference on Computational Intelligence and Data Science (ICCIDS 2018), 132, pp. 307-313.
15	Nahar, K., Jaradat, A., Atoum, M., & Ibrahim, F. (2020). Sentiment analysis and classification of arab jordanian facebook comments for jordanian telecom companies using lexicon-based approach and machine learning. Jordanian Journal of Computers and Information Technology, 6(3), 247-262. doi:10.5455/jjcit.71-1586289399
16	Pal, A., Saha, D., Naskar, S., & Dash, N. (2021). In search of a suitable method for disambiguation of word senses in Bengali. International Journal of Speech Technology, 24(2), 439-454. doi:10.1007/ s10772-020-09787-8
17	Putong, M., & Suharjito. (2020). Classification model of contact center customers emails using machine learning. Advances in Science, Technology and Engineering Systems, 5(1), 174-182. doi:10.25046/aj050123
18	Rusli, N., Amir, A., Zahri, N., & Ahmad, R. (2019). Snake species identification by using natural language processing. Indonesian Journal of Electrical Engineering and Computer Science, 13(3), 999–1006. doi:10.11591/ijeecs.v13.i3.pp999-1006
19	Siddiqui, S., Rehman, M., Daudpota, S., & Waqas, A. (2019). Opinion mining: An approach to feature engineering. International Journal of Advanced Computer Science and Applications, 10(3), 159–165. doi:10.14569/IJACSA.2019.0100320
20	Taheri, S., & Mammadov, M. (2013). Learning the naive bayes classifier with optimization models. International Journal of Applied Mathematics and Computer Science, 23(4), 787-795. doi:10.2478/amcs-2013-0059