<?xml version="1.0" encoding="utf-8"?>
<journal>
  <titleid>80301</titleid>
  <issn>2782-5450</issn>
  <journalInfo lang="ENG">
    <title>Terra Linguistica</title>
  </journalInfo>
  <issue>
    <volume>15</volume>
    <number>3</number>
    <altNumber> </altNumber>
    <dateUni>2024</dateUni>
    <pages>1-123</pages>
    <articles>
      <article>
        <artType>RAR</artType>
        <langPubl>RUS</langPubl>
        <pages>7-14</pages>
        <authors>
          <author num="001">
            <individInfo lang="ENG">
              <orgName>Siberian Federal University, </orgName>
              <surname>Burmakina</surname>
              <initials>Natalia</initials>
              <email>nburmakina@mail.ru</email>
              <address>Krasnoyarsk, Russian Federation</address>
            </individInfo>
          </author>
        </authors>
        <artTitles>
          <artTitle lang="ENG">Automated text simplification in the field of inclusive social addressing</artTitle>
        </artTitles>
        <abstracts>
          <abstract lang="ENG">The article substantiates the relevance of simplifying texts to improve the effectiveness of written communication with people with mental deficits. The task is to consider the possibilities of automatic text simplification tools. The results of automatic text simplification via ChatGPT and neuro-texter.ru are analyzed. The methods of discursive analysis of the text, the comparative method, the method of quantitative assessment of linguistic indicators are used. Quantitative indicators of adapted texts, the complexity of syntactic structures, graphic design and distortions of the content are considered. Conclusions are drawn about the insufficiency of automatic simplification of written texts for the clinical context. Adapted texts contain complex syntactic structures and unreasonable lexical substitutions. The quantitative characteristics of the adapted texts do not meet the requirements of easy language.</abstract>
        </abstracts>
        <codes>
          <doi>10.18721/JHSS.15301</doi>
          <udk>81</udk>
        </codes>
        <keywords>
          <kwdGroup lang="ENG">
            <keyword>easy language</keyword>
            <keyword>text simplification</keyword>
            <keyword>social inclusion</keyword>
            <keyword>readability</keyword>
            <keyword>text adaptation</keyword>
          </kwdGroup>
        </keywords>
        <files>
          <furl>https://human.spbstu.ru/article/2024.57.1/</furl>
          <file>7-14.pdf</file>
        </files>
      </article>
      <article>
        <artType>RAR</artType>
        <langPubl>RUS</langPubl>
        <pages>15-27</pages>
        <authors>
          <author num="001">
            <individInfo lang="ENG">
              <orgName>Russian State University for the Humanities</orgName>
              <surname>Butorina</surname>
              <initials>Elena</initials>
              <email>elenabutorina@yandex.ru</email>
              <address>Moscow, Russian Federation</address>
            </individInfo>
          </author>
        </authors>
        <artTitles>
          <artTitle lang="ENG">Grammatical feature of animacy in some Russian compounds in media</artTitle>
        </artTitles>
        <abstracts>
          <abstract lang="ENG">The paper discusses the grammatical feature of animacy in business discourse compounds whose first and second components differ in animacy: gosudarstvo-otvetchik (respondent State), strana – eksportyor narkotikov (drug-exporting country), etc. on the basis of the contexts represented in the main body and newspaper subcorpus of The Russian National Corpus and the results of the Yandex search engine. The purpose of the research is to identify any differences in the preferences for choosing the meaning of the animacy category in business media discourse for different types of composites: compound words and syntactic constructions including an apposition. The methods used in the study include modeling, classification, contextual analysis, qualitative and quantitative comparison, and surveying language speakers. To test hypotheses, Pearson's Chi-squared test is applied, and the corresponding coefficients are calculated using the CORREL function in Microsoft Excel. The cases of retention of the dictionary meaning of morphological animacy in syntactic structures are analyzed. Of particular interest are the situations of expansion of the animacy feature from one component of the compound to the whole compound. Grammatical and cognitive prerequisites for the assimilation of compound members in grammatical animacy are proposed. The possibility of agreement of adjectives, determinative pronouns, ordinal numerals and participles with such compounds are considered.</abstract>
        </abstracts>
        <codes>
          <doi>10.18721/JHSS.15302</doi>
          <udk>81’38</udk>
        </codes>
        <keywords>
          <kwdGroup lang="ENG">
            <keyword>animacy</keyword>
            <keyword>compounds</keyword>
            <keyword>business discourse</keyword>
            <keyword>media</keyword>
            <keyword>Russian</keyword>
          </kwdGroup>
        </keywords>
        <files>
          <furl>https://human.spbstu.ru/article/2024.57.2/</furl>
          <file>15-27.pdf</file>
        </files>
      </article>
      <article>
        <artType>RAR</artType>
        <langPubl>RUS</langPubl>
        <pages>28-34</pages>
        <authors>
          <author num="001">
            <authorCodes>
              <orcid>0000-0002-4033-4211</orcid>
            </authorCodes>
            <individInfo lang="ENG">
              <orgName>Moscow City University</orgName>
              <surname>Vasilev  </surname>
              <initials>Kirill</initials>
              <email>vasilevkv@mgpu.ru</email>
              <address>Moscow, Russian Federation</address>
            </individInfo>
          </author>
        </authors>
        <artTitles>
          <artTitle lang="ENG">Verbs denoting changes of colour: linguopragmatic analysis based on Russian National Corpus</artTitle>
        </artTitles>
        <abstracts>
          <abstract lang="ENG">The paper looks into the searching potential of Russian National Corpus (RNC). Investigating the lexical group with constant semantical and grammatical features, researchers are to rely on corpus search technologies, the corpus offering rather limited application. Lemmas and tags of RNC search don’t include colour as a search tag for verbs so this function is passed to the instruments referred to in the research suggested, for example, ‘Word at glance’. The service can be used to select lexical units featuring in homogeneous contexts, to establish their semantic proximity, and to determine semantical and grammatical characteristics applicable for the follow-up corpus inquiry. The study focuses on the verbs denoting changes of colour, their associative field, semantic proximity and combinability in collocations, grammatical and semantic features, suitable for regulating subsequent corpus queries. The findings clearly indicate that RNC is hardwired to recognize more criteria at the markup stage than those available for the lexical and grammatical search, and this outlines the perspectives for further corpus development.</abstract>
        </abstracts>
        <codes>
          <doi>10.18721/JHSS.15303</doi>
          <udk>81-13</udk>
        </codes>
        <keywords>
          <kwdGroup lang="ENG">
            <keyword>verbs denoting changes of colour</keyword>
            <keyword>corpus study</keyword>
            <keyword>semantics</keyword>
            <keyword>tokenization</keyword>
            <keyword>Russian National Corpus</keyword>
          </kwdGroup>
        </keywords>
        <files>
          <furl>https://human.spbstu.ru/article/2024.57.3/</furl>
          <file>28-34.pdf</file>
        </files>
      </article>
      <article>
        <artType>RAR</artType>
        <langPubl>RUS</langPubl>
        <pages>35-46</pages>
        <authors>
          <author num="001">
            <authorCodes>
              <orcid>0000-0003-2856-5049</orcid>
            </authorCodes>
            <individInfo lang="ENG">
              <orgName>St. Petersburg State University</orgName>
              <surname>Grebennikov</surname>
              <initials>Alexander</initials>
              <email>agrebennikov@mail.ru</email>
              <address>St. Petersburg, Russian Federation</address>
            </individInfo>
          </author>
          <author num="002">
            <authorCodes>
              <orcid>0000-0001-5953-5403</orcid>
            </authorCodes>
            <individInfo lang="ENG">
              <orgName>Saint Petersburg State University</orgName>
              <surname>Ivanova</surname>
              <initials>Ekaterina </initials>
              <address>St. Petersburg, Russian Federation</address>
            </individInfo>
          </author>
          <author num="003">
            <authorCodes>
              <orcid>0000-0001-8946-4431</orcid>
            </authorCodes>
            <individInfo lang="ENG">
              <orgName>Saint Petersburg State University</orgName>
              <surname>Koryshev</surname>
              <initials>Mikhail </initials>
              <address>St. Petersburg, Russian Federation</address>
            </individInfo>
          </author>
          <author num="004">
            <authorCodes>
              <orcid>0000-0001-7352-353X</orcid>
            </authorCodes>
            <individInfo lang="ENG">
              <orgName>Saint Petersburg State University</orgName>
              <surname>Solovieva</surname>
              <initials>Maria  </initials>
              <address>St. Petersburg, Russian Federation</address>
            </individInfo>
          </author>
        </authors>
        <artTitles>
          <artTitle lang="ENG">Computer technologies in comparative analysis of translation</artTitle>
        </artTitles>
        <abstracts>
          <abstract lang="ENG">The article concerns the resources of computer technologies in the comparative analysis of original and translated texts. The case study was a multilingual corpus based on two multi-genre texts by V.V. Nabokov, i.e. an excerpt from lectures on foreign literature devoted to the analysis of the novel “Madame Bovary” by Gustave Flaubert as an example of Nabokov’s non-fiction prose and its translations into Russian, French and German; Nabokov’s first novel “Mary” written in Russian, its translation into English, edited by the author himself, and the novel’s subsequent translations into French and German. The study involved a comparative analysis of the results obtained through the same method for texts stylistically different and with different authorship of the translation (a lecture / fiction text from the early period of the author’s creative work, translated by independent translators with and without the author’s participation). From the upper zone of the frequency distributions obtained by software methods, it was possible to select the lexemes that, according to the results of the literary analysis, best reflect the main themes of the works. In the analysis of translation equivalents, statistics of coincidences of other lexemes with the analyzed lexeme in pairs of languages (English–Russian, French–Russian, German–Russian) and their contextual associations in the text have been considered. This case study has been the first attempt to analyze with the help of the applied mathematical and computer methods the ways by which translators achieve equivalence when creatively interpreting the meaning of the source text and setting their own associational fields to reveal the content of the author’s themes.</abstract>
        </abstracts>
        <codes>
          <doi>10.18721/JHSS.15304</doi>
          <udk>81.32; 81.33</udk>
        </codes>
        <keywords>
          <kwdGroup lang="ENG">
            <keyword>Nabokov</keyword>
            <keyword>computer technologies</keyword>
            <keyword>equivalence</keyword>
            <keyword>frequency dictionary</keyword>
            <keyword>translation studies</keyword>
            <keyword>Russian literature</keyword>
          </kwdGroup>
        </keywords>
        <files>
          <furl>https://human.spbstu.ru/article/2024.57.4/</furl>
          <file>35-46.pdf</file>
        </files>
      </article>
      <article>
        <artType>RAR</artType>
        <langPubl>RUS</langPubl>
        <pages>47-59</pages>
        <authors>
          <author num="001">
            <authorCodes>
              <orcid>0000-0002-7690-8379</orcid>
            </authorCodes>
            <individInfo lang="ENG">
              <orgName>Volgograd state social-pedagogical University</orgName>
              <surname>Dekatova</surname>
              <initials>Kristina </initials>
              <email>dekatovaki@mail.ru</email>
              <address>Volgograd, Russian Federation</address>
            </individInfo>
          </author>
        </authors>
        <artTitles>
          <artTitle lang="ENG">The contextual interaction of phraseological units and stylistic devices as a complex stylistic phenomenon</artTitle>
        </artTitles>
        <abstracts>
          <abstract lang="ENG">The article focuses on the issue of the functioning of expressive language means in a literary text. The aim of the study is to analyze such interaction of phraseological units and stylistic devices as phraseostylistic convergence and contamination. The sources of the practical research material are poetic texts of the 20th and 21st centuries. The descriptive method and the method of contextual analysis were used in the course of the research. The study has resulted in the description of specific features of equipolent and derivational phraseostylistic contextual interaction. The analysis of the structure of phraseostylistic convergents and contaminants allowed to divide them into two-component and multicomponent constructions. The author of the article describes models of interaction of phraseological units and stylistic devices in multicomponent constructions.</abstract>
        </abstracts>
        <codes>
          <doi>10.18721/JHSS.15305</doi>
          <udk>811.161.1’38</udk>
        </codes>
        <keywords>
          <kwdGroup lang="ENG">
            <keyword>stylistic devices</keyword>
            <keyword>phraseologisms</keyword>
            <keyword>phraseostylistic convergence</keyword>
            <keyword>phraseostylistic contamination</keyword>
            <keyword>poetic text</keyword>
          </kwdGroup>
        </keywords>
        <files>
          <furl>https://human.spbstu.ru/article/2024.57.5/</furl>
          <file>47-59.pdf</file>
        </files>
      </article>
      <article>
        <artType>RAR</artType>
        <langPubl>RUS</langPubl>
        <pages>60-73</pages>
        <authors>
          <author num="001">
            <individInfo lang="ENG">
              <orgName>Minsk State Linguistic University</orgName>
              <surname>Panteleenko</surname>
              <initials>Alessia </initials>
              <email>alessia@list.ru</email>
              <address>Minsk, Republic of Belarus</address>
            </individInfo>
          </author>
        </authors>
        <artTitles>
          <artTitle lang="ENG">Attitudes towards language in a polylingual communicative space: a key to the identity of Italian borderland residents</artTitle>
        </artTitles>
        <abstracts>
          <abstract lang="ENG">The article is devoted to the issue of human attitudes to language in a polylingual communicative space. This problematic, related to the ability of a person to preserve national-cultural and personal self-identification and to discover tools to ensure mutual understanding in society, becomes especially relevant now, during the powerful geopolitical transformations accompanied by multidirectional processes of globalisation and increased attention to the uniqueness of regions. The aim of the article is to identify the mechanisms of linguistic realisation of the identity of the inhabitants of the Italian region of Valle d’Aosta where the minority Francoprovençal is actively manifested against the background of official French-Italian bilingualism. The study is based on a sociolinguistic survey conducted in October 2023 as a part of the joint project between the academic centres in Italy and Germany: the Centre for Francoprovençal Studies and the Institute of Romance Languages at the Humboldt University of Berlin. The sample consists of responses from residents born in the commune of San Nicola between 1923 and 2009 and registered here at the time of the survey. This paper analyses the respondents’ answers to the question “Which language do you think best corresponds to the following statements?”, taking into account different communicative situations. The use of such a direct questionnaire methodology made it possible to obtain detailed answers to the question sought, the analysis of which is presented in the section “Results of the study (discussion)”. The data obtained indicate the dependence of the practice of using French, Francoprovençal and Italian in the speech of Valdostanians on attitudes towards language. The linguistic mechanisms of identity construction of Waldostans are examined with reference to intimate and status institutionalised contexts. In its broad manifestation, the results of the study are significant for realising the processes taking place in the contact zones of closely related languages, for example, Russian and Belarusian.</abstract>
        </abstracts>
        <codes>
          <doi>10.18721/JHSS.15306</doi>
          <udk>811.13’282.3</udk>
        </codes>
        <keywords>
          <kwdGroup lang="ENG">
            <keyword>Italian</keyword>
            <keyword>French</keyword>
            <keyword>Francoprovençal</keyword>
            <keyword>domain</keyword>
            <keyword>sociolinguistics</keyword>
            <keyword>language situation</keyword>
          </kwdGroup>
        </keywords>
        <files>
          <furl>https://human.spbstu.ru/article/2024.57.6/</furl>
          <file>60-73.pdf</file>
        </files>
      </article>
      <article>
        <artType>RAR</artType>
        <langPubl>RUS</langPubl>
        <pages>74-80</pages>
        <authors>
          <author num="001">
            <individInfo lang="ENG">
              <orgName>Vinogradov Russian Language Institute of the Russian Academy of Sciences</orgName>
              <surname>Severskaya </surname>
              <initials>Olga </initials>
              <email>oseverskaya@yandex.ru</email>
              <address>Moscow, Russian Federation</address>
            </individInfo>
          </author>
          <author num="002">
            <individInfo lang="ENG">
              <orgName>Vinogradov Russian Language Institute of the Russian Academy of Sciences</orgName>
              <surname>Turkina </surname>
              <initials>Ekaterina </initials>
              <email>e.turkina@uniyar.ac.ru</email>
              <address>Moscow, Russian Federation</address>
            </individInfo>
          </author>
        </authors>
        <artTitles>
          <artTitle lang="ENG">Artificial intelligence in modern media discourse: strong, weak, personal</artTitle>
        </artTitles>
        <abstracts>
          <abstract lang="ENG">The article contains an attempt to reconstruct the linguistic image of new technologies in the current media discourse, where the discussion of neural networks and AI has become the main trend. The authors apply complex discursive, corpus and content analysis in structuring semantic fields and microfields in the thematic group “Artificial Intelligence”. According to the data obtained, the nodes of the thematic field of AI presented in the media are the clusters “technology”, “algorithm of intellectual activity”, “operating system” and “actor competing with a person”. The analysis of collocations made it possible to determine the conceptualization of AI in the social, economic, scientific, technological and creative spheres. There is a significant opposition between intelligence and reason (artificial and machine). AI appears in three guises in the analyzed contexts: strong, weak, personal. Strong AI prevails, as evidenced mainly by the position of the subject occupied by the nomination. In the media discourse, the machine is personified, endowed, as indicated by compatibility and contextual synonymy, with reason, consciousness and subconsciousness, memory, feelings, turning into a global brain capable of making decisions and creating new intellectual values. The opposition of intelligence and reason (artificial and machine) is highlighted. In the analysis of the intersections of the thematic groups “Artificial Intelligence”, “Science and Technology”, “Risks” the authors see further prospects for research.</abstract>
        </abstracts>
        <codes>
          <doi>10.18721/JHSS.15307</doi>
          <udk>811.161.1'37</udk>
        </codes>
        <keywords>
          <kwdGroup lang="ENG">
            <keyword>artificial intelligence</keyword>
            <keyword>media discourse</keyword>
            <keyword>lexical-semantic field</keyword>
            <keyword>discursive and corpus analysis</keyword>
            <keyword>media picture of the world</keyword>
          </kwdGroup>
        </keywords>
        <files>
          <furl>https://human.spbstu.ru/article/2024.57.7/</furl>
          <file>74-80.pdf</file>
        </files>
      </article>
      <article>
        <artType>RAR</artType>
        <langPubl>RUS</langPubl>
        <pages>81-98</pages>
        <authors>
          <author num="001">
            <authorCodes>
              <orcid>0000-0002-8595-4756</orcid>
            </authorCodes>
            <individInfo lang="ENG">
              <orgName>Volgograd State Social-Pedagogical University</orgName>
              <surname>Shatskaya </surname>
              <initials>Marina </initials>
              <email>marina.schatzckaya@yandex.ru</email>
              <address>Volgograd, Russian Federation</address>
            </individInfo>
          </author>
          <author num="002">
            <authorCodes>
              <orcid>0000-0001-9661-5904</orcid>
            </authorCodes>
            <individInfo lang="ENG">
              <orgName>Volgograd State Social-Pedagogical University</orgName>
              <surname>Chernitsyna</surname>
              <initials>Tatyana  </initials>
              <address>Volgograd, Russian Federation</address>
            </individInfo>
          </author>
        </authors>
        <artTitles>
          <artTitle lang="ENG">Comic provocation and provocation by comedianism: two sides of one communicative and pragmatic phenomenon</artTitle>
        </artTitles>
        <abstracts>
          <abstract lang="ENG">The study of comedy in a communicative and pragmatic aspect, in particular its correlation with one of the varieties of speech influence – provocation, is relevant, since these two phenomena are similar in functional terms (both speech phenomena are based on the expected and unexpected), they are based on the unpredictability of illocution, they are based on an axiological parameter. The purpose of the research underlying the article was to study provocative communication based on comedy or causing comedy in the fiction of V. Shukshin and S. Dovlatov, who use provocation as one of the means of creating comedy. We have identified the emotional and semantic dominants of communicative provocation in the authors' texts. By means of a conversational and linguopragmatic analysis of the functioning of linguistic signs in the speech of the provocateur and the addressee (victim of provocation), their interaction in the act of communication, the plot-forming function of provocation (it is proposed to call it comic provocation) in the stories of V. Shukshin and the culminating (this type is called provocation by us comic) microdialogues in S. Dovlatov's prose is revealed. There are similarities and differences in the parameters of 1) the main tactics used by the provocateurs (irony, discredit and provocative question are similar), 2) the reactions of their victims (hyperreaction and hyporeaction), 3) the completeness of the provocation (in the works of V. Shukshin it is present; in S. Dovlatov, the assessment of the completeness of the provocation is submitted to the reader). In addition, special types of provocation were found in the works of S. Dovlatov – auto-provocation and unintentional provocation.</abstract>
        </abstracts>
        <codes>
          <doi>10.18721/JHSS.15308</doi>
          <udk>81-2</udk>
        </codes>
        <keywords>
          <kwdGroup lang="ENG">
            <keyword>comic</keyword>
            <keyword>provocation</keyword>
            <keyword>verbal impact</keyword>
            <keyword>tactics</keyword>
            <keyword>ideal and ideological provocateurs</keyword>
          </kwdGroup>
        </keywords>
        <files>
          <furl>https://human.spbstu.ru/article/2024.57.8/</furl>
          <file>81-98.pdf</file>
        </files>
      </article>
      <article>
        <artType>RAR</artType>
        <langPubl>RUS</langPubl>
        <pages>99-108</pages>
        <authors>
          <author num="001">
            <authorCodes>
              <orcid>0000-0002-6841-2269</orcid>
            </authorCodes>
            <individInfo lang="ENG">
              <orgName>St. Petersburg State University</orgName>
              <surname>Shramko</surname>
              <initials>Ludmila </initials>
              <email>l.shramko@spbu.ru</email>
              <address>St. Petersburg, Russian Federation</address>
            </individInfo>
          </author>
        </authors>
        <artTitles>
          <artTitle lang="ENG">Strategies and tactics of creating a politician’s image in the modern media (on the example of English-speaking quality press)</artTitle>
        </artTitles>
        <abstracts>
          <abstract lang="ENG">Nowadays all media sources are employed as an effective source of propaganda broadcasting certain political ideas and issues. The paper views the texts of quality press editions belonging to the political discourse which can be found in the sections “Opinion”, “Politics”, and “Editorial” published from the beginning of 2022 till April 2024. In the paper, we prove that due to a well-established reputation of a reliable information provider quality press is one of the key players in the present media market as it is actively involved in the most current political events, taking an active part in forming and maintaining politician’s media image, and manipulating the public opinion. The research aims at identifying and describing discursive strategies and tactics that are employed for forming and maintaining the positive or negative image of a politician in the English-speaking quality press. The analysis of text was carried out with the help of critical discourse analysis method, the theory of semantic fields was also used for viewing and describing the texts we analyzed. The results that the author obtained suggest that “intensify” strategy is used to present a favorable image. Presentation tactics aimed at highlighting the positive qualities and actions is often used for implementation of “intensify” strategy. “Downplay” strategy implemented by accusation and offense tactics is often used for creating a negative media image. Quality press papers tend to publish negative forecasts about the actions of political leaders or the results of their work, that can cause the negative evaluation of such events that can consequently cast a certain unfavorable influence on the general assessment of the politician’s image. The strategy of phantom threat creates a gap between politicians and other people, depriving political leaders of the support of their electorate. The research proves that the ability to cast a significant manipulative influence on the image of a politician makes quality press an effective tool in political battles.</abstract>
        </abstracts>
        <codes>
          <doi>10.18721/JHSS.15309</doi>
          <udk>811.111</udk>
        </codes>
        <keywords>
          <kwdGroup lang="ENG">
            <keyword>quality press</keyword>
            <keyword>critical discourse analysis</keyword>
            <keyword>semantic field</keyword>
            <keyword>media image</keyword>
            <keyword>manipulation</keyword>
            <keyword>strategy</keyword>
            <keyword>political discourse</keyword>
          </kwdGroup>
        </keywords>
        <files>
          <furl>https://human.spbstu.ru/article/2024.57.9/</furl>
          <file>99-108.pdf</file>
        </files>
      </article>
      <article>
        <artType>RAR</artType>
        <langPubl>RUS</langPubl>
        <pages>109-123</pages>
        <authors>
          <author num="001">
            <authorCodes>
              <orcid>0000-0002-1733-6075</orcid>
            </authorCodes>
            <individInfo lang="ENG">
              <orgName>South Ural State University</orgName>
              <surname>Babina</surname>
              <initials>Olga</initials>
              <email>babinaoi@susu.ru</email>
              <address>Chelyabinsk, Russian Federation</address>
            </individInfo>
          </author>
          <author num="002">
            <authorCodes>
              <orcid>0000-0002-7658-7376</orcid>
            </authorCodes>
            <individInfo lang="ENG">
              <orgName>South Ural State University</orgName>
              <surname>Zinoveva </surname>
              <initials>Anastasia </initials>
              <address>Chelyabinsk, Russian Federation</address>
            </individInfo>
          </author>
          <author num="003">
            <individInfo lang="ENG">
              <orgName>South Ural State University</orgName>
              <surname>Nerucheva </surname>
              <initials>Ekaterina  </initials>
              <email>neruchevaed@susu.ru</email>
              <address>Chelyabinsk, Russian Federation</address>
            </individInfo>
          </author>
        </authors>
        <artTitles>
          <artTitle lang="ENG">Dataset preprocessing effects on Bi-LSTM-based concept tagging of text tokens</artTitle>
        </artTitles>
        <abstracts>
          <abstract lang="ENG">The paper considers the problem of natural language dataset preprocessing to improve the neural network model performance. The aim of the study is to find out the dataset preprocessing parameters that ensure higher performance of the model aimed at correlating textual input (a sequence of lexical units) with semantic, or conceptual, classes, i.e. concept tagging. Our methodology includes: a) modeling conceptual annotation of textual units, b) experimenting with textual dataset preprocessing options. The model that we propose takes as input tokens (in lowercase) representing words and multi-component lexical units (phrases), some of which are domain concept related. Since each token may refer to several conceptual classes, the concept tagging task is treated as a multi-label classification problem. In this research, we deal with the corpus of news reports on terrorist attacks in English. We experimented with preprocessing the corpus-based dataset by: a) lemmatizing tokens, b) removing stop words, and c) including sentence separators as individual tokens in the model vocabulary. The multi-label classification model used for the training experiments was a neural network that constructs sequences of lexical unit embeddings and feeds them into a bidirectional long short-term memory (Bi-LSTM) model. The experimental results show that the dataset preprocessed according to all the above-mentioned procedures demonstrated the highest micro-, macro- and weighted averaged F1-scores. The per-class F1-score on the test dataset reaches 88% for the class characterized by high frequency and low lexical variability in the training, validation, and test samples. The novelty of the paper lies in the proposed approach to content analysis of news reports on terrorist attacks using the proposed multi-label classification model. New results were obtained during experimenting with the differently preprocessed corpora of news reports on terrorist attacks. The proposed method may be used for content analysis of news reports specific to other subject areas.</abstract>
        </abstracts>
        <codes>
          <doi>10.18721/JHSS.15310</doi>
          <udk>004.8 + 81'33</udk>
        </codes>
        <keywords>
          <kwdGroup lang="ENG">
            <keyword>semantic tagging</keyword>
            <keyword>natural language processing</keyword>
            <keyword>Bi-LSTM</keyword>
            <keyword>multi-label classification</keyword>
            <keyword>data preprocessing</keyword>
            <keyword>news corpus</keyword>
            <keyword>terrorism</keyword>
          </kwdGroup>
        </keywords>
        <files>
          <furl>https://human.spbstu.ru/article/2024.57.10/</furl>
          <file/>
        </files>
      </article>
    </articles>
  </issue>
</journal>
