<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "https://jats.nlm.nih.gov/publishing/1.3/JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xml:lang="ru">
  <front xmlns:xlink="http://www.w3.org/1999/xlink">
    <journal-meta>
      <journal-id journal-id-type="elibrary">80301</journal-id>
      <journal-title-group>
        <journal-title>Terra Linguistica</journal-title>
        <trans-title-group xml:lang="ru">
          <trans-title>Terra Linguistica</trans-title>
        </trans-title-group>
      </journal-title-group>
      <issn pub-type="epub">2782-5450</issn>
    </journal-meta>
    <article-meta xmlns:xlink="http://www.w3.org/1999/xlink">
      <article-id pub-id-type="publisher-id">4</article-id>
      <article-id pub-id-type="doi">10.18721/JHSS.14104</article-id>
      <title-group>
        <article-title>Studying the impact of morphological parameters on text readability using statistical analysis methods</article-title>
        <trans-title-group xml:lang="ru">
          <trans-title>Исследование влияния параметров морфологической сложности на трудность восприятия медиатекста с использованием методов статистического анализа данных</trans-title>
        </trans-title-group>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">0000-0001-5338-3656</contrib-id>
          <name>
            <surname>Evtushenko</surname>
            <given-names>Tatiana</given-names>
          </name>
          <xref ref-type="aff" rid="aff1"/>
          <email>evtushenkotg@gmail.com</email>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">0000-0002-6326-8392</contrib-id>
          <contrib-id contrib-id-type="scopus">57189038663</contrib-id>
          <contrib-id contrib-id-type="researcherid">6523-2016</contrib-id>
          <name>
            <surname>Klochkova</surname>
            <given-names>Yelena</given-names>
          </name>
          <xref ref-type="aff" rid="aff2"/>
          <email>esklochkova@etu.ru</email>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Laputenko</surname>
            <given-names>Andrey</given-names>
          </name>
          <xref ref-type="aff" rid="aff3"/>
          <email>laputenko.av@gmail.com</email>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">0000-0002-4006-1161</contrib-id>
          <name>
            <surname>Evtushenko</surname>
            <given-names>Nina</given-names>
          </name>
          <xref ref-type="aff" rid="aff4"/>
          <email>evtushenko@ispras.ru</email>
        </contrib>
      </contrib-group>
      <aff id="aff1">Peter the Great St. Petersburg Polytechnic University</aff>
      <aff id="aff2">St. Petersburg Electrotechnical University “LETI”</aff>
      <aff id="aff3">National Research Tomsk State University</aff>
      <aff id="aff4">Institute for System Programming of the Russian Academy of Sciences</aff>
      <pub-date publication-format="electronic" date-type="pub" iso-8601-date="2023-03-31">
        <day>31</day>
        <month>03</month>
        <year>2023</year>
      </pub-date>
      <volume>14</volume>
      <issue>1</issue>
      <fpage>30</fpage>
      <lpage>40</lpage>
      <self-uri xmlns:xlink="http://www.w3.org/1999/xlink" content-type="pdf" xlink:href="https://human.spbstu.ru/userfiles/files/articles/2023/1/30-40.pdf"/>
      <abstract xml:lang="en">
        <p>The paper addresses one of the important aspects of text complexity, namely the dependency of text readability on a set of morphological and text surface metrics such as the average length of words, sentences, etc. The correlation between the objective text complexity which is specified by quantitative parameters of the linguistic features and the subjective text complexity, i.e. the difficulty of text comprehension as a psychological phenomenon, is analyzed. To assess the morphological text complexity we used an annotated dataset consisting of 1000 online news texts (140000 tokens) retrieved from the websites of Russian universities. For each text unit the ratio of each part-of-speech per token is measured. Online news texts of the dataset were also assessed by a target audience of the website, i.e. applicants, undergraduate and postgraduate students. As a result, the dataset was automatically annotated based on text linguistic features and human-labelled based on experts’ estimates of text readability on a 5-point scale. To assess the significance of morphological metrics and their influence on text readability, the correlation and regression analysis was carried out. To automatically classify a text as ‘easy-to-read’ or not ‘easy-to-read’, both single feature and compound models including more than one metric were constructed. In agreement with the prior research the most common metrics influencing text readability appear to be text surface characteristics. However, the proposed models also made it possible to establish the significance of morphological parameters, used both in single feature and compound models, such as the use of participles, nouns in the genitive case, adjectives and numerals, which should be taken into account in analyzing news text readability. Moreover, novel formulae for assessing readability were proposed based on the studied coefficients.</p>
      </abstract>
      <kwd-group xml:lang="en">
        <kwd>text complexity</kwd>
        <kwd>readability</kwd>
        <kwd>morphological features</kwd>
        <kwd>media text</kwd>
        <kwd>correlation and regression analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
</article>
