Mapping word frequencies in fiction on sociopolitical context: the case of early 20th century Russian short stories


The paper deals with the language of Russian short stories written in the period from 1900–1930. It is based on the Russian Short Stories Corpus, an ongoing research project aimed to collect, digitally process, and present the Russian literature of the early 20th century in an electronic form. The Corpus contains the stories written by thousands of Russian authors, both well-known and almost forgotten ones. From the corpus, a sample was taken to serve as a testbed for linguists, lexicographers and literary scholars, enabling them to check their intuitions concerning the language and style of the epoch. The sample has been divided into three subsamples along the lines set by the dramatic turns of Russian history. The first subsample contains the stories produced from the onset of the 20th century up to WWI (1900–1913), the second one refers to the tumultuous period of wars and revolutions (1914–1922), and the third accounts for the stories written in the Soviet Union (1923–1930). The Corpus has proved instrumental in detecting manifold changes in language use, including grammar, vocabulary, syntactic patterns, collocations, and stylistics. In the present paper, frequency-sorted word lists are used to bring out relevant changes in Russian vocabulary, linking them to the sociopolitical context. The results obtained will provide valuable data for the lexicographers compiling Russian dictionaries of the above-mentioned period.