Tabiiy tilni qayta ishlash jarayonlarining dolzarb masalalaridan biri – bu so‘z ma’nosini aniqlashdir. So‘z ma’nosini aniqlash masalasining muhim elementi sifatida omonim so‘zlar qaraladi. Bu masalani yeсhishda mashinali o‘qitishga asoslangan usullar alohida o‘rin tutadi. Naive Bayes klassifikatori ana shunday usullardan biridir. O‘zbek tilidagi turli va grammatik jihatdan o‘xshash bo‘lgan so‘z turkumlari orasidagi omonimiyani bartaraf etishda Naive Bayes klassifikatori soddaligi va tezkorligi bilan boshqa usullardan ajralib turadi. Mazkur klassifi kator ko‘p sinfli tasniflashning eng mashhur algoritmlaridan biri bo‘lib, ko‘rib chiqilayotgan ma’lumotlarga qarab, Naive Bayes algoritmlarining 3 turi (Gauss, Multinominal, Bernoulli)ning istalganidan foydalanish mumkin. Ushbu maqolada o‘zbek tilining grammatik jihatdan o‘xshash bo‘lgan so‘z turkumlari orasidagi omonimiyani aniqlashda klassifi katordan foydalanish jarayonlari batafsil yoritib berilgan.
Tabiiy tilni qayta ishlash jarayonlarining dolzarb masalalaridan biri – bu so‘z ma’nosini aniqlashdir. So‘z ma’nosini aniqlash masalasining muhim elementi sifatida omonim so‘zlar qaraladi. Bu masalani yeсhishda mashinali o‘qitishga asoslangan usullar alohida o‘rin tutadi. Naive Bayes klassifikatori ana shunday usullardan biridir. O‘zbek tilidagi turli va grammatik jihatdan o‘xshash bo‘lgan so‘z turkumlari orasidagi omonimiyani bartaraf etishda Naive Bayes klassifikatori soddaligi va tezkorligi bilan boshqa usullardan ajralib turadi. Mazkur klassifi kator ko‘p sinfli tasniflashning eng mashhur algoritmlaridan biri bo‘lib, ko‘rib chiqilayotgan ma’lumotlarga qarab, Naive Bayes algoritmlarining 3 turi (Gauss, Multinominal, Bernoulli)ning istalganidan foydalanish mumkin. Ushbu maqolada o‘zbek tilining grammatik jihatdan o‘xshash bo‘lgan so‘z turkumlari orasidagi omonimiyani aniqlashda klassifi katordan foydalanish jarayonlari batafsil yoritib berilgan.
Одним из актуальных вопросов обработки естественного языка является определение значения слова. Омонимы рассматриваются как важный элемент определения значения слова. Особую роль в решении этой задачи играют методы, основанные на машинном обучении. Наивный байесовский классификатор – один из важных методов машинного обучения. При устранении омонимии между разными и грамматически сходными группами слов в узбекском языке наивный байесовский классификатор отличается от других методов своей простотой и скоростью. Этот классификатор является одним из самых популярных многоклассовых алгоритмов классификации, и в зависимости от рассматриваемых данных может использоваться любой из 3-х типов наивных байесовских алгоритмов (гауссовский, полиномиальный, бернуллиевский). В данной статье подробно описаны процессы использования классификатора для выявления омонимии между грамматически сходными группами слов узбекского языка.
One of the relevant issues of a natural language processing is word sense disambiguation. Homonyms are considered as an important element of determining the meaning of a word. Methods based on machine learning play a special role in solving this problem. Naive Bayes classifier is one of the important machine learning methods. When eliminating homonymy between different and grammatically similar groups of words in the Uzbek language, the Naive Bayes classifier differs from other methods in its simplicity and speed. This classifier is one of the most popular multi-class classification algorithms, and depending on the data in question, any of the 3 types of Naive Bayes algorithms (Gaussian, Polynomial, Bernoulli) can be used. This article scrutinizes the processes of using the classifier to identify homonymy between grammatically similar groups of words in the Uzbek language.
№ | Имя автора | Должность | Наименование организации |
---|---|---|---|
1 | Elov B.B. | texnika fanlari bo‘yicha falsafa doktori (PhD), dotsent | Alisher Navoiy nomidagi Toshkent davlat o‘zbek tili va adabiyoti universiteti |
2 | Axmedova X.I. | tayanch doktorant | Alisher Navoiy nomidagi Toshkent davlat o‘zbek tili va adabiyoti universiteti |
№ | Название ссылки |
---|---|
1 | Anggraeni, M., Syafrullah, M., & Damanik, H. (2019). Literation Hearing Impairment (I-Chat Bot): Natural Language Processing (NLP) and Naïve Bayes Method. Journal of Physics: Conference Series, 1201, 1-7. doi:10.1088/1742-6596/1201/1/012057 |
2 | Axmedova, X. (2022). Chastotali usul yordamida omonimiyani aniqlash [Determining homonymy using frequency method]. Proceedings of the Republican scientific and practical conference “Prospects of applied uzbek philology”, (pp. 164-170). Tashkent. |
3 | Axmedova, X. (2022). Turli so‘z turkumlari orasidagi omonimiyani aniqlovchi matematik modellar [Mathematical models for identifying homonymy between different word groups]. Science and Innovation, 1(7), 393-400. doi:10.5281/zenodo.7238546 |
4 | Bahri, S., Saputra, R., & Wajhillah, R. (2017). Sentiment analysis based on Natural Language (NLP) with Naïve-Bayes classifier. Proceedings of the National Conference on Social Science & Technology, 1 (1), pp. 176-180. Получено из https://www.researchgate.net |
5 | Bako, A., Taylor, H., Wiley, K., Zheng, J., Walter-McCabe, H., Kasthurirathne, S., & Vest, J. (2021). Using natural language processing to classify social work interventions. American Journal of Managed Care, 27(1), E24–E31. doi:10.37765/AJMC.2021.88580 |
6 | Bogery, R., Babtain, N., Aslam, N., Alkabour, N., Hashim, Y., & Khan, I. (2019). Automatic semantic categorization of news headlines using ensemble machine learning: A comparative study. International Journal of Advanced Computer Science and Applications, 10(11), 689–696. doi:10.14569/ IJACSA.2019.0101190 |
7 | Chifu, A., & Ionescu, R. (2012). Word sense disambiguation to improve precision for ambiguous queries. Open Computer Science, 2(4), 398-411. doi:10.2478/s13537-012-0032-6 |
8 | Elov, B., & Axmedova, X. (2022). Determining homonymy using statistical methods. Proceedings of the Second Uzbekistan-Malaysia International Conference “Computational Models and Technologies (HMT 2022)”, (pp. 106). Tashkent. |
9 | Elov, B., & Axmedova, X. (2022). Uchta so‘z turkumi doirasidagi omonimiyani farqlovchi biznes jarayonni modellashtirish [Modeling a business process that differentiates homonymy within three sets of words]. Science and Innovative Development(1), 150-162. doi:10.36522/2181- 9637-2022-1-15 |
10 | Foster, J., & Wagner, J. (2021). Naive Bayes versus BERT: Jupyter notebook assignments for an introductory NLP course. Proceedings of the 5th Workshop on Teaching Natural Language Processing (pp. 112-114). Association for Computational Linguistics (ACL). doi:10.18653/v1/2021. teachingnlp-1.20 |
11 | Granik, M., & Mesyura, V. (2017). Fake news detection using naive Bayes classifier. Proceedings of the 2017 IEEE 1st Ukraine Conference on Electrical and Computer Engineering (UKRCON 2017) (pp. 900-903). Institute of Electrical and Electronics Engineers Inc. doi:10.1109/ UKRCON.2017.8100379 |
12 | Kaur, C. (2020). Sentiment Analysis of Tweets on Social Issues using Machine Learning Approach. International Journal of Advanced Trends in Computer Science and Engineering, 9(4), 6303– 6311. doi:10.30534/ijatcse/2020/310942020 |
13 | Ku, C., & Leroy, G. (2014). A decision support system: Automated crime report analysis and classification for e-government. Government Information Quarterly, 31(4), 534-544. doi:10.1016/j. giq.2014.08.003 |
14 | Kunal, S., Saha, A., Varma, A., & Tiwari, V. (2018). Textual Dissection Of Live Twitter Reviews Using Naive Bayes. Proceedings of the International Conference on Computational Intelligence and Data Science (ICCIDS 2018), 132, pp. 307-313. |
15 | Nahar, K., Jaradat, A., Atoum, M., & Ibrahim, F. (2020). Sentiment analysis and classification of arab jordanian facebook comments for jordanian telecom companies using lexicon-based approach and machine learning. Jordanian Journal of Computers and Information Technology, 6(3), 247-262. doi:10.5455/jjcit.71-1586289399 |
16 | Pal, A., Saha, D., Naskar, S., & Dash, N. (2021). In search of a suitable method for disambiguation of word senses in Bengali. International Journal of Speech Technology, 24(2), 439-454. doi:10.1007/ s10772-020-09787-8 |
17 | Putong, M., & Suharjito. (2020). Classification model of contact center customers emails using machine learning. Advances in Science, Technology and Engineering Systems, 5(1), 174-182. doi:10.25046/aj050123 |
18 | Rusli, N., Amir, A., Zahri, N., & Ahmad, R. (2019). Snake species identification by using natural language processing. Indonesian Journal of Electrical Engineering and Computer Science, 13(3), 999–1006. doi:10.11591/ijeecs.v13.i3.pp999-1006 |
19 | Siddiqui, S., Rehman, M., Daudpota, S., & Waqas, A. (2019). Opinion mining: An approach to feature engineering. International Journal of Advanced Computer Science and Applications, 10(3), 159–165. doi:10.14569/IJACSA.2019.0100320 |
20 | Taheri, S., & Mammadov, M. (2013). Learning the naive bayes classifier with optimization models. International Journal of Applied Mathematics and Computer Science, 23(4), 787-795. doi:10.2478/amcs-2013-0059 |