Development of a Context-Free Grammar (CFG)-based model and algorithm for predicate identification in simple Uzbek sentences.
Authors
Maksud S. Sharipov, Ixtiyor D. Avezmatov

Share
Annotation
Syntactic parsing is one of the most critical stages within existing analysis methods in Natural Language Processing (NLP). It identifies sentence and phrase types and exposes the grammatical relations between words. Because Uzbek belongs to the agglutinative family of languages, its morphological and syntactic analysis requires specially tailored approaches. Uzbek is as a low-resource language, and—up to now—there are still no sufficiently robust models for syntactic parsing of its texts. In the syntactic analysis of Uzbek, the most critical challenge is to identify the predicate (verb phrase) among the sentence constituents. Therefore, this article develops an algorithm and model for predicate identification in sentences based on Context-Free Grammar (CFG), along with accompanying IDEF0 and IDEF1X models. Using these models, the architecture of the proposed system is presented in an overall schematic: its functional capabilities are described with the IDEF0 model, while the relationships among objects are depicted with the IDEF1X model. In addition, a rule-based algorithm for detecting the predicate in Uzbek sentences has also been implemented. During the execution of the algorithm, the program first calls the pre-developed Python library UzbekTagger to determine the part of speech of the word under examination, and then invokes the UzbekLemmatizer library to identify the word’s affixes. Consequently, a dedicated database that classifies Uzbek words by their parts of speech has also been created.
Keywords
Authors
Maksud S. Sharipov, Ixtiyor D. Avezmatov

Share
References:
References
[1] Sayfullayeva R.R., Mengliyev B.R., Boqiyeva G.H., Qurbanova M.M., Yunusova Z.Q., Abuzalova M.Q. Hozirgi o‘zbek adabiy tili. O‘quv qo‘llanma. – T., «Fan va texnologiya», 2009. – 416 b.
Sharipov M. va Sobirov O., “Development of a Rule-Based Lemmatization Algorithm Through Finite State Machine for Uzbek Language,” CEUR Workshop Proceedings, 2022, pp. 154–159. CEUR-WS
Sharipov M., Kuriyozov E. R., Yuldashev O., Sobirov O., “UzbekVerbDetection: Rule-Based Detection of Verbs in Uzbek Texts,” Proc. LREC-COLING 2024, Torino, 2024, pp. 17343–17347. ACL Anthology
Zafer H. R., A Generic Syntactic Parser for Turkic Languages, MSc thesis, Fatih University, 2011. Acik Bilim
Sharipov M. S., Adinaev H. S., Kuriyozov E. R., “Rule-Based Punctuation Algorithm for the Uzbek Language,” Proc. 25th IEEE Int. Conf. of Young Professionals on Micro/Nanotechnologies and Electron Devices (EDM), 2024, pp. 2410–2414. CoLab
Dönmez İ., Adalı E., “Context Free Grammar for Turkish,” Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, vol. 22, no. 2, 2018, pp. 552–561. Home
Sharipbay A., Yergesh B., Razakhova B., Yelibayeva G., Mukanova A., “Syntax Parsing Model of Kazakh Simple Sentences,” ACM International Conference Proceeding Series, 2019. ACM Digital Library
Gribanova V., “Predicate Formation and Verb-Stranding Ellipsis in Uzbek,” Glossa: a journal of general linguistics, vol. 5, no. 1, 2020, Art. 124. Glossa
Kornfilt J., “Complex Predicates in Turkish,” in The Oxford Handbook of Turkish Linguistics, Oxford University Press, 2012, pp. 319–333. JSTOR
Seddah D. va boshq., “Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically-Rich Languages,” Proc. 4th Workshop on Statistical Parsing of Morphologically-Rich Languages (SPMRL), Seattle, 2013, pp. 146–182. ACL Anthology
Raxmonova M. A., O‘zbek tilidagi sodda gaplar uchun sintaktik parsing modeli, Toshkent: Navoiy davlat O‘zbek tili va adabiyoti universiteti, 2023.
Tsukanova O. A. Methodology and tools for business process modeling: a tutorial – St. Petersburg: ITMO University, 2015. – 100 p.
