Development and evaluation of the lexical substitution algorithm for Russian based on predictive neural network models


The paper deals with the lexical substitution task for the Russian language. Lexical substitution is essentially the task of determining the best suiting substitute for a given target word in context. Although the task has been actively researched for English as well as some other European languages, there is little data for Russian. Besides, few studies consider the type of semantic relations between the target word and its substitutes. Our algorithm works with Russian and produces synonym, hypernym and hyponym substitutes. We use the RuWordNet lexical database for predicting substitutes, and fastText word embeddings for the candidate ranking task. The algorithm is evaluated through psycholinguistic experiments, and the results are analyzed in the paper. The research data may be of interest for specialists in the field of computational linguistics and artificial intelligence, and be applied to such NLP tasks as paraphrasing, machine translation, text simplification, as well as linguodidactics.